It's very true in my experience, and we've looked pretty hard at all the options. Thanks for the offer of the slides, I have first hand experience with this at scale.
It certainly depends a lot on how sophisticated your autoscaling is and how closely you're able to follow demand to limit waste, how well you can manage per-host utilization, whether you are CPU or memory bound, and lots of other factors. But even at truly massive scale the cloud hosting option can be very competitive without nearly as much management overhead.
Interesting! I have the opposite experience and I also operated at quite a decent scale.
I wonder why our experiences are so different.
For context I was running 2,500+ instances at peak with 40vCPU and 256GiB of memory: the most expensive of those of course being in the regions with low density of players like South America and Australia.
We also had a predictive auto-scaler for our cloud components, I would estimate that our waste was 15% at any time, but if we ran bare metal only it would have been cheaper (except for the operational cost and the fact we needed lengthy commitments for hardware)
We run rather a lot more than 2500 instances at peak and a variety of instance types dependent on game mode. Our waste is on the order of 5-7% depending on the time of day -- we can run less waste on the downward slope of the demand curve for instance.
Aha, Fortnite. You guys will get discounts that we will never get, please keep that in mind.
Our waste was the inverse of yours, less waste on the upswing and more waste on the downswing, due to the long lived nature of our instances.
One instance takes roughly 1,000 players, but if you as a player decide to just keep playing, then we won't kick you out of the game for some hours. Some percentage of players will just continue playing as long as possible without matchmaking or map transfers.
> Tooling/Perf/etc is not needed when running (do you really want to debug in production) but tooling can be used in the process of development.
Whether or not you want to debug in production, reality often means that you will see things in a live environment that you will not see in other environments.
Unikernels are very interesting and have a number of compelling attributes, but let's not pretend that the current state of available tooling for troubleshooting, instrumentation, and general debugging isn't a challenge.
You're moving the goal posts and invoking a straw man. No one is advocating for "pretending", and the anti-unikernel argument is that the inherent cost of unikernels in general is the loss of kernel debugging tools; not simply that "right now the unikernel debugging experience is subpar".
in general right now, running on a full-featured monolithic kernel, the debugging experience is really pretty bad. especially in the target environment of horizontal clustered services or lots of little micro services.
so I actually believe there is an opportunity here to focus on the important pieces (network messages, control flow tracing, memory footprints, etc) after ejecting a huge amount of irrelevant stuff
Totally agree, there's definitely an opportunity there if the surface area of what the system is doing gets smaller. The big difference today is that it's a lot easier to compose a debugging suite using additional tools on top of a traditional host-based runtime for now.
Definitely excited to see how the technology evolves over the next few years. It hasn't moved as fast as I'd have expected over the last 2-3 years but I'd love to see that accelerate.
No, I'm not moving any goal posts. I was reacting to the statement "Tooling/Perf/etc is not needed when running". I'm not sure where you'd get the idea that I'm anti-unikernel, I just wanted to not disregard that challenge since it does matter.
I took care not to say that you were anti-unikernel; I was articulating the anti-unikernel position. I'm not sure what you actually meant, so I'll take you at your word that you weren't moving goal posts--in whatever case, the OP is correct that unikernels aren't inherently less debugable even though mature debug tooling may not exist for them today.
This is an asinine argument that seems to be driven by the belief that there is such a thing as efficient allocation of capital in the real world. Or perhaps just sour grapes that some of the unsophisticated investors they like to take advantage of are instead putting their money index funds.
It actually is unreasonable for you to ask that. It's also a pretty bad take to bring up the 150,000 people die every day statistic as if the fact that a single person who many people had a connection to dying is not of significance.
If you truly don't care just move on. If you want to know more, it doesn't take a great deal of skill to use Twitter, GitHub, Google, and more to find out all that you want to know.
As I said in response to another comment in this thread, I don't often share my slides without the context of the recording of the talk that went with it. That recording isn't available yet, but these slides have more words than most of the ones I present usually do so I thought I'd share them.
In my talk, each of the main points was accompanied with an anecdote to give some background on the experiences that led me to believe them, as well as some expand on the content of the slide.
The talk was specifically built for an audience of attendees of the DevOps Enterprise Summit. It's a fantastic event with one of the most thoughtful and engaged group of attendees I've experienced, and I've been to a lot of conferences. These folks are, generally speaking, development or operations leaders at enterprise companies who are in the midst of a transition to a style of work that's a significant departure from traditional enterprise IT. Since much of what we call DevOps now is built on the way that many of us in the world of startups and Internet companies have been doing things for a long time, Gene Kim asked me to share some ideas that I think are universal.
It was one of the most fun talks I can remember since I was given the license to tell stories. ;)
I generally don't like to post decks without the recording of the talk and I was kind of surprised this one got shared as widely as it seems to have been. I'm glad you enjoyed it.
- Very elastic demand curve
- Generally ephemeral with any necessary persistent state stored off-host
- Benefit from geographic distribution of cloud regions because they're generally quite latency sensitive