xavierd's comments

xavierd · on Sept 2, 2016

There is still significant sharing that can be achieved inside a VM, plus, a lot of the sharing come from zero pages (full of 0) which is still performed accross VMs.

Another benefit of the salting mechanism is that it allows the administrator to define groups of VMs that are trusted in which sharing will be performed.

disclaimer: I work at VMware and wrote the salting code.

lawnchair_larry · on Sept 2, 2016

Does the salting address the issue described in the dedup est machina paper? I noticed they did not mention that it worked against VMWare.

xavierd · on Oct 6, 2015

A lot of those optimizations would no longer yield any benefits[0]. The CPU archictecture evolved a lot in 16 years, especially in branch/code prediction to the point where a correctly predicted branch (without branch_likely) has almost no cost.

[0]: At least, this is true for x86 CPUs.

e5f34f89 · on Oct 6, 2015

As a CPU architect, I can confirm that all those except possibly 2) will not yield significant benefits. Prefetching hints will only be useful when the particular code fragment is highly memory-bound because most wide superscalar microarchitectures will easily hide L1/L2 miss latencies.

fanf2 · on Oct 8, 2015

My qp trie code <http://dotat.at/prog/qp/> got a performance boost of about 20% by adding prefetch hints in the obvious places. The inner loop is fetch / compute / fetch / compute, chaining down into the true. The next fetch will (usually) be some small offset from a pointer we can get immediately after its preceding fetch, so prefetch the base pointer, then compute to work out the exact offset.

tkinom · on Oct 6, 2015

If a DDR stall is 50+ cpu cycles, (probably a lot more with today's 2, 3GHz CPU), I am not sure if superscalar microarchitectures would help too much.

At lease in my case of networking packet forwarding app, I had the profiling data to prove that was an issue.

The app code is not that long ~2000 lines of code after clean up. But it have a lot of table looks up (DDR stall) and branches for error condition checks.

userbinator · on Oct 6, 2015

Ditto for prefetch instructions:

https://lwn.net/Articles/444336/

A MIPS is probably the exact opposite to modern (which actually means anything P6 and above) x86 CPUs in terms of performance characteristics. If I were to guess what member of the x86 family might actually benefit from such optimisation, it would be NetBurst (which itself has very different performance characteristics from every other x86 family that came before or after it.)

tkinom · on Oct 6, 2015

I was trying to optimize for a network app. The goal of trying to get to 1 million pps. At that time 200Mz CPU, 1 cache miss is 50+ cycles. or 25% of the CPU budget, prefetch helped a lot in that case.

kbenson · on Oct 6, 2015

I've occasionally wondered how long it takes highly optimized C/C++ to be surpassed by optimizing compilers due to CPU advancement and the optimizations either making compiler optimization harder, or the optimizations target assumptions about CPU architecture that are no longer valid.

That is, what is the shelf life of a very low level CPU optimization for Intel hardware.

daemin · on Oct 6, 2015

Well while strictly not on topic, there was this talk recently on micro-optimisations.

https://www.youtube.com/watch?v=nXaxk27zwlk

kbenson · on Oct 7, 2015

While it seemed to cover more of the compiler optimizations and how to do some low level benchmarking and optimizing and wasn't really addressing when those might become obsolete, it was really interesting and informative. Thanks!

matt_d · on Oct 10, 2015

There's been an interesting talk on this a few months ago:

- http://blog.cr.yp.to/20150314-optimizing.html

- (PDF) http://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf

Discussions:

- https://news.ycombinator.com/item?id=9202858

- https://news.ycombinator.com/item?id=9396950

xavierd · on May 3, 2015

I would say you're documenting way too much: the obvious shouldn't need documentation.

cFire · on May 3, 2015

It's more a matter of what you consider to be obvious in this case. In my experience, especially regex matching can become very not-so-obvious if you're coming back to code you wrote some time ago. Or if someone else is reading it.

richardwhiuk · on May 3, 2015

		# Do NAT table lookup
		n = natlookup( p1, p2 )

is clearly unnecessary. If natlookup doesn't do a NAT table lookup, it's a badly named function.