More

emfree · on Oct 9, 2018

Goroutine stacks are in fact allocated on the heap. All the details are in here: https://github.com/golang/go/blob/master/src/runtime/stack.g...

pcwalton · on Oct 9, 2018

By heap allocating stack frames I mean that each time you call a function, SML/NJ allocates space for that individual frame on the heap. It then uses the garbage collector to clean up frames after the function call has returned.

twic · on Oct 9, 2018

Smalltalk-80 did this too. Later implementations try to avoid it, but sometimes still do:

http://www.wirfs-brock.com/allen/things/smalltalk-things/eff...

emfree · on March 11, 2018

A nice writeup, thanks. There are a few variations on this workflow that I've found useful in practice; perhaps they'll be helpful to some folks:

- Linux perf can profile unmodified Go programs. This is handy when your application doesn't expose the /debug/pprof endpoint. (http://brendangregg.com/FlameGraphs/cpuflamegraphs.html#perf has detailed instructions)

- Recent versions of https://github.com/google/pprof include a flamegraph viewer in the web UI. This is handy when you want a line-level flamegraph instead of a function-level flamegraph.

emfree · on Dec 7, 2017

Use cases for ProxySQL: many.

- failover

- query routing (e.g., for sharded deployments)

- caching

- workload stats/metrics

- query rewriting

etc.

emfree · on Aug 29, 2017

Curious: What's your strategy for measuring application performance? Would love to hear more details on how you're tracking the effect of your efforts.

pspeter3 · on Aug 29, 2017

At a high level, we try to collect real user metrics for everything we can. Interana helps us analyze that data in near real time. For more detail, I hope we can write a blog post soon to answer your question.

emfree · on Aug 25, 2017

You piqued my curiosity :) A comment in the source for the release notes (https://github.com/golang/go/blob/master/doc/go1.9.html#L922) points to the relevant change: https://go-review.googlesource.com/c/go/+/34310, which in turn links to this issue: https://github.com/golang/go/issues/13086

Scorpiion · on Aug 25, 2017

Ah, thanks for the links.

I googled a little bit and found some good info, I guess I had forgotten a little bit of the concepts of mutex fairness/unfairness. I found a very nice explanation on cs.stackexchange:

  "My understanding is that most popular implementations of a mutex (e.g. std::mutex in C++) do not guarantee fairness -- that is, they do not guarantee that in instances of contention, the lock will be acquired by threads in the order that they called lock(). In fact, it is even possible (although hopefully uncommon) that in cases of high contention, some of the threads waiting to acquire the mutex might never acquire it."

Source: https://cs.stackexchange.com/questions/70125/why-are-most-mu...

With that computer science clarification, I think the comment "Mutex is now more fair" and the detailed description "Unfair wait time is now limited to 1ms" makes it a lot clearer.

Great improvement I think! It's one of those things that you don't notice until you have a bug, but it's really nice to never get that bug in the first place. =)

baby · on Aug 25, 2017

Check the comment here on Mutex faireness: https://go-review.googlesource.com/c/go/+/34310/8/src/sync/m...

> Mutex fairness.

> Mutex can be in 2 modes of operations: normal and starvation.

> In normal mode waiters are queued in FIFO order, but a woken up waiter

> does not own the mutex and competes with new arriving goroutines over

> the ownership. New arriving goroutines have an advantage -- they are

> already running on CPU and there can be lots of them, so a woken up

> waiter has good chances of losing. In such case it is queued at front

> of the wait queue. If a waiter fails to acquire the mutex for more than 1ms,

> it switches mutex to the starvation mode.

> In starvation mode ownership of the mutex is directly handed off from

> the unlocking goroutine to the waiter at the front of the queue.

> New arriving goroutines don't try to acquire the mutex even if it appears

> to be unlocked, and don't try to spin. Instead they queue themselves at

> the tail of the wait queue.

> If a waiter receives ownership of the mutex and sees that either

> (1) it is the last waiter in the queue, or (2) it waited for less than 1 ms,

> it switches mutex back to normal operation mode.

> Normal mode has considerably better performance as a goroutine can acquire

> a mutex several times in a row even if there are blocked waiters.

> Starvation mode is important to prevent pathological cases of tail latency.

emfree · on June 17, 2017

I think the author is specifically evaluating low-concurrency in-memory workloads here. The previous post describes why regressions for those workloads might be "not a surprise": https://smalldatum.blogspot.com/2017/05/the-history-of-low-c...

lathiat · on June 17, 2017

This context is key here. High concurrency workloads have not suffered this badly and in most cases have increased performance and that's specifically part of why. For many (or possibly even a good majority) of users, that's more important than low-concurrency.

Those blindly suggesting MariaDB or Percona is all well and good but naive. Percona generally follows MySQL upstream but makes their own little tweaks which could be (but are not always) fixed in MySQL upstream based on differing specific priorities.

MariaDB is highly diverged at this point and likely has a whole different set of problems and benefits.

Author of these posts is well known for a long (10+ years) history of working with MySQL at a deep level rivalling that of many of the developers, generally writes good stuff (though is not infallible :)

kijin · on June 17, 2017

Yes, it seems that Oracle does care about performance in high-concurrency applications: https://www.mysql.com/why-mysql/benchmarks/

Even their own benchmarks show MySQL 5.7 dipping below previous versions at the low end of the concurrency scale.

mdcallag · on June 17, 2017

Yes, context is everything when looking at benchmark results. I am showing the worst-case for this performance regression. Thanks for mentioning that. Note, that is my blog.

I also showed the impact for IO-bound workloads, and even there the regression is larger than I want. But not as bad as the in-memory workloads.

emfree · on Feb 21, 2017

One of the older posts eloquently discusses why "accidentally quadratic" behavior is both so recurring and so insidious: http://accidentallyquadratic.tumblr.com/post/113840433022/wh...

vanderZwan · on Feb 21, 2017

Raise your hand if you have not only written accidentally quadratic code, but managed to compose said code in such a way that you ended up with something even worse!

raises hand

Imagine you have a React SPA where users can select a number of items for which they want more information (I can't give more details at this moment). Said information has to be fetched from the server upon selection.

I recently discovered that the website would fetch all selected items whenever a new item was selected, including previously selected items, regardless of whether it had already been fetched. So if I start with one selection and build up to n, that is 1 + 2 + .. + n-1 + n = (n²+n)/2[0]. Whoops.

To make it worse, I had also somehow had managed to set up my components in such a way that for each fetch being received, React would remount (that's right, re-mount) all of the selected items. Don't ask me how. I'll just say have only been doing web stuff since last year and leave it at that.

If we assume a user clicks fast enough to select every item before the fetches are received, that would be the previous equation multiplied by n selections, so.. (n³+n²)/2. Yikes!

So yeah... that was stupidly slow. Here's what I did to fix it; obvious but worth spelling out:

    - only fetch what hasn't been/isn't being fetched
    - only re-render the component that was updated
    - don't immediately do the above when the user 
      selects an item; let them make a selection first,
      then click a "show information" button, then
      fetch all the needed information in a *single* fetch
      and render it in a *single* update to the components.

[0] https://www.wolframalpha.com/input/?i=sum+one+to+n

emfree · on Jan 23, 2017

I'll second the post above -- if you miss Scuba, honeycomb.io is for you. https://honeycomb.io/blog/2016/11/honeycomb-faq-in-140-chars...

jmtulloss · on Jan 24, 2017

There's also Snorkel, mostly written by Okay Zed (one of the authors listed on that Scuba paper) http://snorkel.logv.org/

emfree · on Dec 21, 2016

These are such great comments, thanks for sharing your insights. For folks looking for other options, I'd also mention https://honeycomb.io, perhaps the most promising newcomer in this space. It's essentially Facebook's Scuba for the rest of us.

emfree · on Dec 20, 2016

Looks cool. Instrumenting at the network layer is certainly a promising approach. Are you recording latency distributions, and not just averages? The screenshots only show mean and median latency, which isn't enough to spot many anomalies.

ackerman80 · on Dec 20, 2016

Right now we maintain few select percentiles from the latency distribution over 1 min time-period. We plan to maintain latency histograms which will allow you to look at latency distribution on arbitrary time intervals.

coleca · on Dec 20, 2016

Any information on pricing?

smb06 · on Dec 20, 2016

Netsil AOC is priced by the number of vCPUs or cores that you would be monitoring. You can reach out to us at hello@netsil.com for the exact price quote based on your needs.