More

sovietmudkipz · 2026-01-11T16:34:34 1768149274

I’m so close to the switch myself for silly reasons. I don’t like windows due to their creepy business practices and negative design patterns in their OS so I’m very bias against it. Forcing copilot is just the latest in their creepy practices…

For more details on why I came close to switching: I use my win desktop as a host for ai services such as Comfy UI for stable diffusion generation since it is a beefy platform; for example, I generate reference stuff for Krita (digital painting software) illustrations on my drawing tablet. I remember the process to configure windows as being strange, GUI bound (NOT windows strong suit), and just annoying due to my aforementioned bias. Valve has done great work with running games on linux which is the only reason I keep that OS and I’d rather set up services on linux.

This comment serves as a reminder to myself that I should just go ahead grab my windows license keys for archival purposes and flash a better OS on that system.

krior · 2026-01-11T16:39:55 1768149595

Don't forget that Krita has home-turf advantage on Linux :)

moooo99 · 2026-01-11T18:17:43 1768155463

Krita is among the main reasons why I am so impressed with the KDE project. Not only do they deliver a very good desktop environment, but they also deliver some genuine flagship apps for it

bigyabai · 2026-01-11T18:16:23 1768155383

As does Wacom. My drawing tablet from 2002 is plug-and-play with zero driver installation.

francoi8 · 2026-01-11T16:54:12 1768150452

There's nothing silly about those reasons.

sovietmudkipz · 2026-01-11T18:16:07 1768155367

I call it silly because essentially I’m complaining that I don’t like setting up a service on an OS I don’t normally set up services for. My quibbles with that process can in part be addressed by crafting a terminal based workflow to configure the host and enable the service on my desktop, skipping the GUI completely. E.g. I’m sure I can do Task Scheduler shenanigans through powershell. More experience would help sand down the rough parts I experienced.

Now the product decisions behind the OS giving me the icks… The terminal can’t (completely) help with that ^_^

tempest_ · 2026-01-11T17:08:53 1768151333

Eh those applications can both be run in linux without issue.

They are likely to have more issues related to getting the drawing tablet configured correctly.

The rest is just having to start from scratch and lose the decades of windows experience and intuition which can make things painful as that type of thing cant be replaced without time.

gchamonlive · 2026-01-11T16:55:46 1768150546

Do you have reference for the krita+comfyui setup? I have a drawing tablet and always wanted to augment drawings using AI but never got around to deploying a stack for that. I have a 3090 that should be enough for it, I just need a reference for the setup.

sovietmudkipz · 2026-01-11T20:18:19 1768162699

Plugin for Krita I use: https://github.com/Acly/krita-ai-diffusion

App I recommend to download models and manage UIs: https://github.com/LykosAI/StabilityMatrix

(1) Download Krita (2) Download and install the Krita AI diffusion plugin (3) Run comfy UI using StabilityMatrix

Docs for using the Krita AI plugin: https://docs.interstice.cloud/basics/ It's a really fun plugin to use!

Zetaphor · 2026-01-11T19:46:41 1768160801

Running ComfyUI or _any_ AI stuff on Linux is a night and day difference in terms of ease of use and performance when compared against that of Windows users. Python on Windows is suffering

dragonwriter · 2026-01-11T20:28:56 1768163336

The pain I’ve experienced running ComfyUI on windows is from (1) pytorch and the complexities of managing it through pip when python’s platform concept doesn't encompass CUDA versions, (2) dependency conflicts between custom nodes (some of which also involve #1 because they pin a specific pytorch version as a dependency), and (3) gratuitous breakage in ComfyUI updates.

None of which Linux makes any better.

simooooo · 2026-01-11T23:11:18 1768173078

You just making stuff up? ComfyUI just works on windows.

user34283 · 2026-01-11T20:23:05 1768162985

I fail to see how Python or ComfyUI would be easier to setup and use on Linux, unless we're talking about torch compile or Triton.

otherme123 · 2026-01-11T17:01:39 1768150899

I recently started using Lutris for gaming in Linux, and so far so good.

littlestymaar · 2026-01-11T16:55:23 1768150523

Having just yesterday installed a fresh Mint distro on a newly received PC that came with Windows pre-installed, I can tell you that this is merely an hour of work, and most of the time will be spent downloading and burning the .iso on the USB key.

You should just give it a go tonight.

BuddyPickett · 2026-01-11T18:16:57 1768155417

Copilot is already great and it's only getting better. I can get so much more work done and the same amount of time or get the same amount of work done in a lot less time. It's funny how many people were afraid of AI. Technophobes abound in this world

grub5000 · 2026-01-11T19:28:33 1768159713

Funny, I disagree. I think copilot truly sucks compared to the other options. But you can uninstall copilot, so I don’t see why it bothers people at all.

sovietmudkipz · 2026-01-02T15:24:40 1767367480

I always thought the companies I worked for would implement chaos testing shortly after this talk/blog released. However; only last year did we do anything even approaching chaos testing. I think this goes to show that the adage “the future is already here just unevenly distributed” carries some truth in some contexts!

I think the companies I worked for were prioritizing working on no issue deployments (built from a series of documented and undocumented manual processes!) rather than making services resilient through chaos testing. As a younger dev this priority struck me as heresy (come on guys, follow the herd!); as a more mature dev I understand time & effort are scarce resources and the daily toil tax needs to be paid to make forward progress… it’s tough living in a non-ideal world!

oooyay · 2026-01-02T15:42:21 1767368541

Chaos testing rarely uncovers anything significant or actionable beyond things you can suss out yourself with a thorough review but has the added potential for customer harm if you don't have all your ducks in a row. It also neatly requires, as a prerequisite, for you to have your ducks in a row.

I think that's why most companies don't do it. A lot of tedium and the main benefit was actually getting your ducks in a row.

closeparen · 2026-01-02T17:55:15 1767376515

I think it is more of a social technology for keeping your ducks in a row. Developers won’t be able to gamble that something “never happens” if we induce it weekly.

GauntletWizard · 2026-01-02T20:24:27 1767385467

Much of the value from Chaos testing can be gotten much more simply with good rolling CI. Many of the problems that Chaos engineering solved are now considered table stakes, directly implemented into our frameworks and tested well by saidsame CI.

A significant problem with early 'Web Scale' deployments was out of date or stale configuration values. You would specify that your application connects to backend1.example.com for payments and backend2.example.com for search. A common bug in early libraries was that the connection was established once at startup, and then never again. When the backend1 service was long lived, this just worked for months or years at a time - TCP is very reliable, especially if you have sane values on keepalives and retries. Chaos Monkey helped find this class of bug. A more advanced but quite similar class of bug: You configured a DNS name, which was evaluated once at startup, and again didn't update, Your server for backend1 had a stable address for years at a time, but suddenly you needed to failover to your backup or move it to new hardware. At the time of chaos monkey, I had people fight me on this - They believed that doing a DNS lookup every five minutes for your important backends was unacceptable overhead.

The other part is - Modern deployment strategies make these old problems untenable to begin with. If you're deploying on kubernetes, you don't have an option here - Your pods are getting rebuilt with new IP addresses regularly. If you're connecting to a service IP, then that IP is explicitly a LB - It is defined as stable. These concepts are not complex, but they are edge boundaries, and we have better and more explicit contracts because we've realized the need and you "just do" deploy this way now.

Those are just Chaos Monkey problems, though - Latency Monkey is huge, but solves a much less common problem. Conformity Monkey is mostly solved by compliance tools; You don't build, you buy it. Doctor Monkey is just healthchecks - K8s (and other deployment frameworks) has those built in.

In short, Chaos Monkey isn't necessary because we've injected the chaos and learned to control most of what that was doing, and people have adopted the other tools - They're just not standalone, they're built in.

bpt3 · 2026-01-02T15:43:35 1767368615

It's a great way of thinking about resiliency and fault tolerance, but it's also definitely on the very mature end of the systems engineering spectrum.

If you know things will break when you start making non-deterministic configuration changes, you aren't ready for chaos engineering. Most companies never get out of this state.

closeparen · 2026-01-02T18:00:56 1767376856

Having a few fault injection scenarios is baby steps. Next would be Jepsen-style testing, and most mature would be formal verification.

sovietmudkipz · 2025-12-19T18:52:18 1766170338

What’s the difference between scraping and malicious scraping? Does google engage in scraping or malicious scraping? Do the AI companies engage in scraping or malicious scraping?

jchw · 2025-12-19T18:57:25 1766170645

Note that I am not defending the merits of Google's lawsuit, but they did describe in this very post what they believe distinguishes their scraping versus SerpApi.

> Stealthy scrapers like SerpApi override those directives and give sites no choice at all. SerpApi uses shady back doors — like cloaking themselves, bombarding websites with massive networks of bots and giving their crawlers fake and constantly changing names — circumventing our security measures to take websites’ content wholesale. [...] SerpApi deceptively takes content that Google licenses from others (like images that appear in Knowledge Panels, real-time data in Search features and much more), and then resells it for a fee. In doing so, it willfully disregards the rights and directives of websites and providers whose content appears in Search.

To me this seems... interesting, for sure. I think that Google already set a bad precedent by pulling content from the web directly into its results, and an even worse one by paying websites with user-generated content for said content (while those sites didn't pay the users that actually made the user-generated content, as an additional bitchslap.)

But it seems like at the very least Google is suggesting that SerpApi is effectively trying to "steal" the work Google did, rather than do the same work themselves. Though I wonder if this is really Google pulling up the ladder behind them a bit, given how privileged of a position they are in with regards to web scraping.

It's a tough case. I think that something does need to ultimately be done about "malicious" web scraping that ignores robots.txt, but traditionally that sort of thing did not violate any laws, and I feel somewhat skeptical that it will be found to violate the law today. I mean, didn't LinkedIn try this same thing?

moralestapia · 2025-12-19T19:01:07 1766170867

>bombarding websites with massive networks of bots

Like GoogleBot?

And yeah, robots.txt is not enforced by any law.

I think this is just about dragging SerpApi through a lengthy legal procedure and fees.

throw-12-16 · 2025-12-19T18:54:33 1766170473

The size of your legal team.

jefftk · 2025-12-19T18:57:38 1766170658

Whether you obey robots.txt (Google does, SerpApi doesn't) seems like an important distinction.

xnx · 2025-12-19T19:02:12 1766170932

Permission

bakugo · 2025-12-19T18:54:45 1766170485

Malicious scraping is when people other than them do it. When they scrape the internet to train their AI, it's "lawful" because they said so.

sovietmudkipz · 2025-12-18T16:40:42 1766076042

Hobbyist game dev here with random systemd thoughts. I’ve recently started to lean on systemd more as my ‘local game server process manager’ process. At first I thought I’d have to write this up myself as a whole slew of custom code, but then I realized the linux distros I use have systemd. That + cgroups and profiling my game server’s performance lets me pack an OS with as many game servers dynamically (target 80% resource utilization, funny things happen after that — things I don’t quite understand).

In this way I’m able to set up AWS EC2 instances or digital ocean droplets, a bunch of game servers spin up and report back their existence to a backend game services API. So far it’s working but this part of my project is still in development.

I used to target containerizing my apps, which adds complexity, but often in AWS I have to care about VMs as resources anyways (e.g. AWS gamelift requires me to spin up VMs, same with AWS EKS). I’m still going back and forth between containerizing and using systemd; having a local stack easily spun up via docker compose is nice, but with systemd what I write locally is basically what runs in prod environment, and there’s less waiting for container builds and such.

I share all of this in case there’s a gray beard wizard out there who can offer opinions. I have a tendency to explore and research (it’s fuuun!) so I’m not sure if I’m on a “this is cool and a great idea” path or on a “nobody does this because <reasons>” path.

colechristensen · 2025-12-18T17:44:55 1766079895

> (target 80% resource utilization, funny things happen after that — things I don’t quite understand).

The closer you get to 100% resource utilization the more regular your workload has to become. If you can queue requests and latency isn't a problem, no problem, but then you have a batch process and not a live one (obviously not for games).

The reason is because live work doesn't come in regular beats, it comes in clusters that scale in a fractal way. If your long term mean is one request per second what actually happens is you get five requests in one second, three seconds with one request each, one second with two requests, and five seconds with 0 requests (you get my point). "fractal burstiness"

You have to have free resources to handle the spikes at all scales.

Also very many systems suffer from the processing time for a single request increasing as overall system loads increase. "queuing latency blowup"

So what happens? You get a spike, get behind, and never ever catch up.

https://en.wikipedia.org/wiki/Network_congestion#Congestive_...

sovietmudkipz · 2025-12-18T19:11:16 1766085076

Yea. I realize I ought to dig into things more to understand how to push past into 90%-95% utilization territory. Thanks for the resource to read through.

mpyne · 2025-12-19T00:15:54 1766103354

You absolutely do not want 90-95% utilization. At that level of utilitization random variability alone is enough to cause massive whiplash in average queue lengths.

The cycle time impact of variability of a single-server/single-queue system at 95% load is nearly 25x the impact on the same system at 75% load, and there are similar measures for other process queues.

As the other comment notes, you should really work from an assumption that 80% is max loading, just as you'd never aim to have a swap file or swap partition of exactly the amount of memory overcommit you expect.

rcxdude · 2025-12-19T01:23:39 1766107419

Man, if there's one idea I wish I could jam into the head of anyone running an organization, it would be queuing theory. So many people can't understand that slack is necessary to have quick turnaround.

sovietmudkipz · 2025-12-19T03:14:25 1766114065

Mmmm, I remember reading this in Systems Performance Brendan Gregg. I should revisit what was written…

sovietmudkipz · 2025-12-19T03:12:14 1766113934

I target 80% utilization because I’ve seen that figure multiple times. I suppose I should rephrase: I’d like to understand the constraints and systems involved that make 80% considered full utilization. There’s obviously something that limits a OS; is it tunable?

Questions I imagine a thorough multiplayer solutions engineer would be curious of, the kind of person whose trying to squeeze as much juice out of the hardware specs as possible.

btschaegg · 2025-12-19T06:44:18 1766126658

It might not be the OS, but just statistical inevitability. If you're talking about CPU utilization on Linux, for example, it's not all that unlikely that the number you're staring at isn't "time spent by CPU doing things" but "average CPU run queue length". "100%" then doesn't only mean the CPU gets no rest, but "there's always someone waiting for a CPU to become free". It likely pays off to understand where the load numbers in your tooling actually come from.

Even if that weren't the case, lead times for tasks will always increase with more utilization; see e.g. [1]: If you push a system from 80% to 95% utilization, you have to expect a ~4.75x increase in lead time for each task _on average_: (0.95/0.05) / (0.8/0.2)

Note that all except the term containing ρ in the formula are defined by your system/software/clientele, so you can drop them for a purely relative comparison.

[1]: https://en.wikipedia.org/wiki/Kingman%27s_formula

Edit: Or, to try to picture the issue more intuitively: If you're on a highway nearing 100% utilization, you're likely standing in a traffic jam. And if that's not (yet) strictly the case, the probabilty of a small hiccup creating one increases exponentially.

mpyne · 2025-12-19T13:42:55 1766151775

> I’d like to understand the constraints and systems involved that make 80% considered full utilization. There’s obviously something that limits a OS; is it tunable?

There are OS tunables, and these tunables will have some measure of impact on the overall system performance.

But the things that make high-utilization systems so bad for cycle time are inherent aspects of a queue-based system that you cannot escape through better tuning, because the issues these cause to cycle time were not due to a lack of tuning.

If you can tune a system so that what previously would have been 95% loading is instead 82% loading that will show significant performance improvements, but you'd erase all those improvements if you just allowed the system to go back up to 95% loaded.

sovietmudkipz · 2025-12-19T15:04:13 1766156653

Hmmm makes sense. Sounds like I may have a misunderstood mental model of resource consumption. I ought to reread https://technology.riotgames.com/news/valorants-128-tick-ser... (specifically the section on “Real World Performance” where the engineer describes tuning) now that I have a better appreciation that they’re not trying to make resource utilization % higher, but instead making available more resources through tuning efforts.

mpyne · 2025-12-20T19:02:22 1766257342

Yeah, a big thing is latency vs. throughput.

That's a great article you link and it basically notes up front what the throughput requirements are in terms of cores per player, which then sets the budget for what the latency can be for a single player's game.

Now, if you imagine for a second that they managed to get it so that the average game will just barely meet their frame time threshold, and try to optimize it so that they are running right at 99% capacity, they have put themselves in an extremely dangerous position in terms of meeting latency requirements.

Any variability in hitting that frame time would cause a player to bleed over into the next player's game, reducing the amount of time the server had to process that other player's game ticks. That would percolate down the line, impacting a great many players' games just because of one tiny little delay in handling one player's game.

In fact it's reasons like this that they started off with a flat 10% fudge adjustment to account for OS/scheduling/software overhead. By doing so they've in principle already baked-in a 5-8% reduction in capacity usage compared to theoretical.

But you'll notice in the chart that they show from recent game sessions in 2020 that the aggregate server frame time didn't hang out at 2.34 ms (their adjusted per-server target), it actually tended to average at 2.0 ms, or about 85% of the already-lowered target.

And that same chart makes clear why that is important, as there was some pretty significant variability in each day's aggregate frame times, with some play sessions even going above 2.34 ms on average. Had they been operating at exactly 2.34 ms they would definitely have needed to add more server capacity.

But because they were in practice aiming at 85% usage (of a 95% usage figure), they had enough slack to absorb the variability they were seeing, and stay within their overall server expectations within ±1%.

Statistical variability is a fact of life, especially when humans and/or networks are involved, and systems don't respond well to variability when they are loaded to maximum capacity, even if it seems like that would be the most cost-effective.

Typically this only works where it's OK to ignore variability of time, such as in batch processing (where cost-effective throughput is more valuable than low-latency).

colechristensen · 2025-12-18T20:01:56 1766088116

One way to think about it is 80% IS full utilization.

The engineering time, the risks of decreased performance, and the fragility of pushing the limit at some point become not worth the benefits of reaching some higher metric of utilization. If it's not where you are, that optimum trade off point is somewhere.

esseph · 2025-12-18T17:22:15 1766078535

If you use podman quadlets, you get containers and systemd together as a first class citizen, in a config that is easily portable to kubernetes if you need more complex features.

sovietmudkipz · 2025-12-18T17:52:45 1766080365

O.O this may be the feature that gets me into podman over docker.

asmor · 2025-12-19T01:44:42 1766108682

They're very cool. I actually combine them with Nix. Because why not.

https://github.com/SEIAROTg/quadlet-nix

esseph · 2025-12-18T18:38:26 1766083106

The shift from docker to podman was originally quite painful at first, but it's much better, very usable, and quite stable now.

Still, I can see the draw for independent devs to use docker compose. Teams and orgs though makes sense to use podman and systemd for the smaller stuff or dev, and then literally export the config as a kubernetes yaml.

ziml77 · 2025-12-19T13:27:16 1766150836

How is podman managed in larger environments? It's designed around running rootless, but it seemed like the nature of that is there wasn't a proper way to centrally manage the containers even on just a single machine. Like even seeing the logs required using machinectl to get a shell as the user who owns the service (sudo/su do not retain the necessary environment variables for journalctl to work). Trying to get the logs as root seems to let you filter down to the user (rather than service) level at best.

Meanwhile, with Docker (or the not recommended rootful podman), you can have centralized management of multiple machines with a tool like Portainer.

I like the idea of podman, but this has been a head-scratcher for me.

esseph · 2025-12-20T19:05:52 1766257552

The way you manage podman in large environments is called Kubernetes :)

podman is really only suitable for a single node, but they may have added things I have missed.

madjam002 · 2025-12-18T17:38:56 1766079536

Definitely don't recommend going down this path if you're not already familiar with Nix, but if you are, a strategy that I find works really well is to package your software with Nix, then you can run it easily via systemd but also create super lightweight containers using nix-snapshotter[0] so you don't have to "build" container images if you still want the flexibility of containers. You can then run the containers on Docker or Kubernetes without having to build heavy images.

[0] https://github.com/pdtpartners/nix-snapshotter

frantathefranta · 2025-12-18T18:18:36 1766081916

I don't recommend getting familiar with Nix because your chances of getting nerd sniped by random HN comments increase exponentially.

sovietmudkipz · 2025-12-19T04:13:10 1766117590

Funny. I probably will dive into Nix some day but I've been content letting it sit waiting for me to check it out.

dijit · 2025-12-18T17:05:57 1766077557

This is sort of how I designed Accelbytes managed gameserver system (previously called: Armada).

You provide us a docker image, and we unpack it, turn it into a VM image and run as many instances as you want side-by-side with CPU affinity and NUMA awareness. Obviating the docker network stack for latency/throughput reasons - since you can

They had tried nomad, agones and raw k8s before that.

sovietmudkipz · 2025-12-18T17:29:12 1766078952

Checking out the website now. Looks enticing. Would a user of accelbyte multiplayer services still be in the business of knowing about underlying VMs? I caught some copy on the website that led me to question.

As a hobbyist part of me wants the VM abstracted completely (which may not be realistic). I want to say “here’s my game server process, it needs this much cpu/mem/network per unit, and I need 100 processes” and not really care about the underlying VM(s), at least until later. The closest thing I’ve found to this is AWS fargate.

Also holy smokes if you were a part of the team that architected this solution I’d love to pick your brain.

maccard · 2025-12-18T19:06:40 1766084800

There’s a couple of providers that give you that kind of abstraction. Playfab is _pretty close_ but it’s fairly slow to ramp up and down. There is/was multiplay - they’ve had some changes recently and I’m not sure what their situation is right now. There’s also stuff like Hathora (they’re great but expensive).

At a previous job, we used azure container apps - it’s what you _want_ fargate to be. AIUI, Google Cloud Run is pretty much the same deal but I’ve no experience with it. I’ve considered deploying them as lambdas in the past depending on session length too…

gcr · 2025-12-19T00:26:39 1766103999

Cloud Run tries to be this but every service like this has quirks. For example, GCR doesn’t let you deploy to high-CPU/MEM instances, has lower performance due to multi-tenant hosts, etc

maccard · 2025-12-19T17:03:44 1766163824

But that’s not what OP asked for. They asked for

> As a hobbyist part of me wants the VM abstracted completely (which may not be realistic). I want to say “here’s my game server process, it needs this much cpu/mem/network per unit, and I need 100 processes” and not really care about the underlying VM(s), at least until later. The closest thing I’ve found to this is AWS fargate.

You can’t have on demand usage with no noisy neighbours without managing the underlying VMs.

I used hathora [0] at my previous job, (they’ve expanded since and I’m not sure how much this applies anymore) - they had a CLI tool which took a dockerfile and a folder and built a container and you could run it anywhere globally after that. Their client SDK contained a “get lowest latency location” that you could call on startup to use. It was super neat, but quite expensive!

[0] https://hathora.dev/gaming

dijit · 2025-12-18T19:06:37 1766084797

That was was actually the original intent. If we scale to bare metal providers we can get much more performance. m

By making it an “us” problem to run the infrastructure at a good cost, and be cheaper then than AWS for us to run, meaning we could take no profit on cloud vms. making us cost competitive as hell.

sovietmudkipz · 2025-12-19T04:12:09 1766117529

If I understand correctly you're saying you manage hardware yourself (colocate in a data center? Dedicated hosting?) and that gives you an edge in pricing. That's pretty cool, and I think I can see how it could be less expensive to purchase hardware & maintain it rather than renting that compute from a third party. There is obviously the tradeoff of then being responsible for capacity planning for the workloads supported among other downsides and maintaining hardware lifecycle but I wouldn't be surprised to hear this downside is overstated compared to benefits reaped.

dijit · 2025-12-19T16:00:46 1766160046

Now I'm at a PC and can reply properly instead of typing drunkenly from my phone.

Ok, the idea was that what we really want is "ease of use" and "cost effective".

In game development (at least the gamedev I did) we didn't really want to care about managing fleets or versions, we just wanted to publish a build and then request that same build to run in a region.

So, the way I originally designed Accelbytes managed gameservers was that we treat docker containers as the distribution platform (if it runs in Docker it'll have all the bundled dependencies after all) and then you just submit that to us. We reconcile the docker image into a baked VM image on the popular cloud providers and you pay per minute that they're actively used. The reason to do it this way is that cloud providers are really flexible with the size of their machines.

So, the next issue, cost!

If we're using extremely expensive cloud VMs, then the cloud providers can undercut us by offering managed gameservers; worse, people don't compare devex of those things (though it's important to me when I was at AB); so we need to offer things at basically a neutral cost. It has to be the same price (or, ideally cheaper) to use Accelbyte's managed gameservers over trying to do it yourself on a cloud provider.. That way we guarantee the cloud providers don't undercut us: they wouldn't cannabalise their own margins on a base product like VMs to offer them below market rate.

So, we turn to bare-metal. We can build a fleet of globally managed servers, we can even rent them to begin with. By making it our problem to get good costs (because that pays for development), we are forced to make good decisions about CapEx vs OpEx, and it becomes "our DNA" to actually run servers, something most companies don't want to think about- but cloud costs are exorbitant and you need specialists to run them (I am one of those).

The bursty nature of games, seems like it fits best in a cloud, but you'll often find that naturally games don't like to ship next to each other, and the first weeks are the "worst" weeks in terms of capacity. If you have a live service game that sustains it's own numbers: that's a rarity, and in those cases it's even easier to plan capacity.

But if you build for a single burst, and you're a neutral third-party: you have basically built for every burst, and the more people who use you, the more profit you can make on the same hardware. -- and the more we buy, the better volume discounts we get, and the better we get at running things, etc;etc;etc.

Anyway, in order to make effective use of bare-metal, I wrote a gvisor clone that had some supervisor functionality, the basic gist of it was that the supervisor could export statistics of the gameserver, such as number of connections to the designated GS port (which is a reasonable default for player count) and information if it had completed loading (I only had two ways of being able to know this, one was the Agones RPC, the other was looking for a flag on disk... I was going to implement more), as well as ensuring the process is alive and lifecycling the process on death (collect up logs, crash dumps, any persistence, send it to the central backend to be processed). It was also responsible for telling the kernel to jail the process to the resources that the game-developer had requested. (So, if they asked for 2vCPU and 12G of ram, then, that's what they get).

It was also looking at NUMA awareness and CPU affinity, so, some cores would have been wasted (Unreal Engines server for example ideally likes to have 2 CPU Cores, where 1 is doing basically everything, and the other is about 20% used- theoretically you can binpack that second thread onto a CPU core, but my experience on The Division made me think that I really hate when my computer lies to me and plenty of IaaS providers have abstractions that lie.

I wrote the supervisor in Rust and it had about a 130KiB (static) memory footprint and I set myself a personal budget of 2 ticks per interval, but I left before achieving that one.

I could go into a lot of detail about it, I was really happy when I discovered that they continued my design. It didn't make me popular when I came up with it I think, they wanted something simple, and despite being developers, developing something instead of taking something off the shelf is never simple.

Also, they were really in bed with AWS. So anything that took away from AWS dominance was looked at with suspicion.

sovietmudkipz · 2025-12-21T12:14:25 1766319265

Updates: meditating on the description of your Rust supervisor helped clarify my own approach.

- Go based supervisor daemon runs as a systemd service on the host. I configure it to know about my particular game server and expected utilization target. - The supervisor is responsible for reconciling my desired expectations (a count, or % of cpu/mem/etc so far) with spinning up game servers managed by systemd (since systemd doesn’t natively support this sort of dynamism + go code is super lean). - If I want more than one type of game server I imagine I could extend this technique to spinning up more than one supervisor but I’m keeping that in my back pocket for now. - I haven’t thought up a reason to, but my Go supervisor might want to read the logs of my game servers through journald.

For my purposes I’m not making a generic solution for unknown workloads like your Rust supervisor, which probably helps reduce complexity.

My workstation uses systemd so I can see my supervisor working easily. Real heckin’ neat.

dijit · 2025-12-21T12:27:45 1766320065

nice, I really think you’ll go far with that approach on a single server.

My only advice, is capture stdout from the supervisor of the child process (gameserver) instead of putting an own dependency on journald: because everyone speaks stdout and you can later enrich your local metrics with it if its structured well, and forward it centrally.

sovietmudkipz · 2025-12-19T17:57:32 1766167052

I really appreciate your perspective! Pretty clear you’re someone who digs deep, doesn’t accept constraints given (eg ‘everyone’ desired an AWS based solution but that didn’t narrow your exploration), and tries to be pragmatic. At least that’s my impression.

What you’ve stated so far is interesting. I’ve reading through some of your blog content too. I sent a LinkedIn connection request (look for Andrew).

Probing question: knowing what you know do you have any opinions (strong or otherwise) on what a solo dev might want to pursue? Just curious how you respond.

miladyincontrol · 2025-12-19T02:20:26 1766110826

> I’m still going back and forth between containerizing and using systemd

Why not both? Systemd allows you to make containers via nspawn, which are defined just about the exact same as you do a regular systemd service. Best of both worlds.

Levitating · 2025-12-19T09:30:38 1766136638

> Best of both worlds.

That would be portable[1] services.

[1]: https://systemd.io/PORTABLE_SERVICES/

baggy_trough · 2025-12-18T16:45:53 1766076353

Did you try systemd's containers (nspawn)?

sovietmudkipz · 2025-12-18T17:18:07 1766078287

…no. TIL.

nszceta · 2025-12-18T18:27:09 1766082429

I wrote a blog post about using nspawn from an Arch Linux host. The Arch Wiki shows more information about how to get a Debian base if you want that instead. Link to the wiki is at the bottom of the blog post along with more references.

https://adamgradzki.com/lightweight-development-sandboxes-wi...

panick21_ · 2025-12-18T17:35:50 1766079350

Portable services are another option.

open-paren · 2025-12-18T17:40:56 1766079656

And podman systemd quadlets yet another

https://docs.podman.io/en/latest/markdown/podman-systemd.uni...

sovietmudkipz · 2025-12-18T17:53:33 1766080413

Wow systemd can do more than I thought to imagine it could

bonzini · 2025-12-18T18:10:09 1766081409

Technically that's part of podman, not systemd. But it's the same architecture that was used to support sysvinit scripts.

(In fact, nothing prevents anyone from extracting and repackaging the sysvinit generator, now that I think of it).

reactordev · 2025-12-18T18:31:52 1766082712

This actually works really well with custom user scripts to do the initial setup. It’s also trivial to do this with docker/podman if you don’t want it to take over the machine. Batching/Matchmaking is the hard part of this, setting up a fleet is the fun part of this.

I’ve also done Microsoft Orleans clusters and still recommend the single pid, multiple containers/processes approach. If you can avoid Orleans and kubernetes and all that, the better. It just adds complexity to this setup.

sovietmudkipz · 2025-12-19T14:46:49 1766155609

> If you can avoid Orleans and kubernetes and all that, the better. It just adds complexity to this setup.

I’m starting to appreciate simplicity away from containers that’s why I’m even exploring systemd. I bet big on containers and developed plenty of skills, especially with k8s. I never stopped to appreciate that I’m partly in the business of making processes run on OSes, and it kinda doesn’t matter if the pid is a container or running ‘directly’ on the hardware. I’ll probably layer it back in but for now I’m kinda avoiding it as an exercise.

E.g. if I’m testing a debug ready build locally and want to attach my debugger, I can do that in k8s but there’s a ceremony of opening relevant ports and properly pointing to the file system of the container. Not a show stopper since I mostly debug while writing testing/production code in dev… But occasionally the built artifact demands inspection.

rbjorklin · 2025-12-18T17:34:33 1766079273

You sound like you've explored at least a few options in this space. Have you looked at https://agones.dev/ ?

sovietmudkipz · 2025-12-18T17:50:58 1766080258

Yes! It’s a great project. I’m super happy they have a coherent local development story. I kinda abandoned using it though when I said “keeeep it simple” and stopped using containers/k8s. I think I needed to journey through understanding why multiplayer game services like Agones/gamelift/photon were set up like they were. I read through Multiplayer Game Programming: Architecting Networked Games by Joshua Glazer and Sanjay Madhav really helped (not to mention allowed me to better understand GDC talks over multiplayer topics much better).

This all probably speaks to my odd prioritization: I want to understand and use. I’ve had to step back and realize part of the fun I have in pursuing these projects is the research.

dontlaugh · 2025-12-19T11:00:45 1766142045

I’ve also found docker / k8s to mostly just get in the way. Even VMs are often a problem, depending on the details of the game.

Bare metal is the only actually good option, but you often have to do a lot yourself. Multiplay did offer it last time I looked, but I don’t know what’s going on with them now.

sovietmudkipz · 2025-11-09T11:57:56 1762689476

I created an app using a similar concept as a hackathon project, in meteorJS. It was fun! We won 2nd place.

sovietmudkipz · 2025-09-03T00:46:49 1756860409

To those who have worked with autonomous background agents techniques, can you describe the stack and the workflow?

Has anyone set up a local only autonomous agent, using an open source model from somewhere like huggingface?

Still a bit confused on the details of implementing the technique. Would appreciate any explanations (thanks in advance).

sovietmudkipz · 2025-08-28T16:49:10 1756399750

Sci-fi readers who’ve read Ender’s Game will recognize this style of software as similar in concept to the Mind Game Ender Wiggins plays. In the book, the Mind Game renders a game world based on the subject’s mind (conscious, subconscious) in a mechanically similar way to how dreams work for us IRL.

I’m excited for AI rendered games.

sovietmudkipz · 2025-08-21T18:27:49 1755800869

Thank goodness I went full tilt into Godot (and became a donor) after Unity’s last controversy.

Godot is much more hacker friendly than Unity, IMO. Ymmv

xandrius · 2025-08-22T10:50:16 1755859816

I always hear that and I wonder how you used Unity before, as godot does absolutely nothing that Unity doesn't excel or does better.

The only thing is editor speed but given how few features godot has, it's not surprising.

I wish someone just took what Unity does and made it a copy with better editor performance and open source.

sovietmudkipz · 2025-08-11T18:25:39 1754936739

What is everyone’s favorite parallel agent stack?

I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?

sovietmudkipz · 2025-07-29T23:40:08 1753832408

Enterprise software engineers have to be good with AI to know when it will save time deploying it.

Understanding where it’s effective and how to use AI tooling is a sought after skill.

I would advise programmers to understand an AI tooling stack that they know very well.

blinkbat · 2025-07-29T23:42:30 1753832550

give me an example of this "tooling stack" that isn't just prompting the LLM, or using a Cursor-like or Claude CLI-like?