By heap allocating stack frames I mean that each time you call a function, SML/NJ allocates space for that individual frame on the heap. It then uses the garbage collector to clean up frames after the function call has returned.
- Recent versions of https://github.com/google/pprof include a flamegraph viewer in the web UI. This is handy when you want a line-level flamegraph instead of a function-level flamegraph.
Curious: What's your strategy for measuring application performance? Would love to hear more details on how you're tracking the effect of your efforts.
At a high level, we try to collect real user metrics for everything we can. Interana helps us analyze that data in near real time. For more detail, I hope we can write a blog post soon to answer your question.
I googled a little bit and found some good info, I guess I had forgotten a little bit of the concepts of mutex fairness/unfairness. I found a very nice explanation on cs.stackexchange:
"My understanding is that most popular implementations of a mutex (e.g. std::mutex in C++) do not guarantee fairness -- that is, they do not guarantee that in instances of contention, the lock will be acquired by threads in the order that they called lock(). In fact, it is even possible (although hopefully uncommon) that in cases of high contention, some of the threads waiting to acquire the mutex might never acquire it."
With that computer science clarification, I think the comment "Mutex is now more fair" and the detailed description "Unfair wait time is now limited to 1ms" makes it a lot clearer.
Great improvement I think! It's one of those things that you don't notice until you have a bug, but it's really nice to never get that bug in the first place. =)
This context is key here. High concurrency workloads have not suffered this badly and in most cases have increased performance and that's specifically part of why. For many (or possibly even a good majority) of users, that's more important than low-concurrency.
Those blindly suggesting MariaDB or Percona is all well and good but naive. Percona generally follows MySQL upstream but makes their own little tweaks which could be (but are not always) fixed in MySQL upstream based on differing specific priorities.
MariaDB is highly diverged at this point and likely has a whole different set of problems and benefits.
Author of these posts is well known for a long (10+ years) history of working with MySQL at a deep level rivalling that of many of the developers, generally writes good stuff (though is not infallible :)
Yes, context is everything when looking at benchmark results. I am showing the worst-case for this performance regression. Thanks for mentioning that. Note, that is my blog.
I also showed the impact for IO-bound workloads, and even there the regression is larger than I want. But not as bad as the in-memory workloads.
Raise your hand if you have not only written accidentally quadratic code, but managed to compose said code in such a way that you ended up with something even worse!
raises hand
Imagine you have a React SPA where users can select a number of items for which they want more information (I can't give more details at this moment). Said information has to be fetched from the server upon selection.
I recently discovered that the website would fetch all selected items whenever a new item was selected, including previously selected items, regardless of whether it had already been fetched. So if I start with one selection and build up to n, that is 1 + 2 + .. + n-1 + n = (n²+n)/2[0]. Whoops.
To make it worse, I had also somehow had managed to set up my components in such a way that for each fetch being received, React would remount (that's right, re-mount) all of the selected items. Don't ask me how. I'll just say have only been doing web stuff since last year and leave it at that.
If we assume a user clicks fast enough to select every item before the fetches are received, that would be the previous equation multiplied by n selections, so.. (n³+n²)/2. Yikes!
So yeah... that was stupidly slow. Here's what I did to fix it; obvious but worth spelling out:
- only fetch what hasn't been/isn't being fetched
- only re-render the component that was updated
- don't immediately do the above when the user
selects an item; let them make a selection first,
then click a "show information" button, then
fetch all the needed information in a *single* fetch
and render it in a *single* update to the components.
These are such great comments, thanks for sharing your insights. For folks looking for other options, I'd also mention https://honeycomb.io, perhaps the most promising newcomer in this space. It's essentially Facebook's Scuba for the rest of us.
Looks cool. Instrumenting at the network layer is certainly a promising approach. Are you recording latency distributions, and not just averages? The screenshots only show mean and median latency, which isn't enough to spot many anomalies.
Right now we maintain few select percentiles from the latency distribution over 1 min time-period. We plan to maintain latency histograms which will allow you to look at latency distribution on arbitrary time intervals.
Netsil AOC is priced by the number of vCPUs or cores that you would be monitoring. You can reach out to us at hello@netsil.com for the exact price quote based on your needs.