Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

THANK YOU. People look at me like I’m insane when I tell them that their overly-complicated pipeline could be easily handled by a couple of beefy servers. Or at best, they’ll argue that “this way, they don’t have to manage infrastructure.” Except you do - you absolutely do. It’s just been partially abstracted away, and some parts like OS maintenance are handled (not that that was ever the difficult part of managing servers), but you absolutely need to configure and monitor your specific XaaS you’re renting.




Anyone that says, "they don’t have to manage infrastructure" I would invite them to deal with a multi-environment terraform setup and tell me again that about what they don't have to manage.

While terraform is not ideal it is much much more easy to deal with managed services in AWS than to deal with on premises baremetal servers.

Most are biased because they like dealing with the kind of issues in on premises.

They like dealing with the performance regressions, heat maps, kernel issues etc. Because why not? You are a developer and you need some way to exercise your skills. AWS takes that away and makes you focus on the product. Issues arising from AWS only requires you talking to support. Most developers get into this industry for the love of solving these problems and not actually solving product requirements.

AWS takes away what devs like and brings in more "actual" work.


> AWS takes that away and makes you focus on the product. Issues arising from AWS only requires you talking to support.

Not my experience at all. e.g. NLBs don't support ICMP which has broken some clients of the application I work on. When we tried to turn on preserve-client-ip so we could get past the ephemeral port limit, it started causing issues with MSS negotiation, breaking some small fraction of clients. This stuff is insanely hard to debug because you can't get onto the loadbalancer to do packet captures (nor can AWS support). Loadbalancing for long-lived connections works poorly.

Lambda runs into performance issues immediately for a web application server because it's just an entirely broken architecture for that use-case (it's basically the exact opposite of user-mode threads to scale: let's use an entire VM per request!). For some reason they encourage people to do it anyway. Lord help you if you have someone with some political capital in your org that wants to push for that.

RDS also runs into performance issues the moment you actually have some traffic. A baremetal server is orders of magnitude more capable.

ipv6 support is still randomly full of gaps (or has only very recently been fixed, except you might have to do things like recreate your production EKS cluster, oops) which leads to random problems that you have to architect around. Taken with NAT gateway being absurdly expensive, you end up having to invert sensible architectures or go through extra proxy layers that just complicate things.

AWS takes basic skills around how to build/maintain backend systems and makes half of your knowledge useless/impossible to apply, instead upgrading all of your simple tuning tasks into architectural design problems. The summary of my last few years has basically been working around problems that almost entirely originate from trying to move software into EKS and dealing with random constraints that would take minutes to fix baremetal.


I agree that building your backend on Lambda is terrible for many reasons: slow starts, request / response size restrictions, limitations in "layer" sizes, etc.

RDS, however, I have found to be rock solid. What have you run into?


The parent compares RDS to baremetal, which I think isn't a fair comparison at all. Especially since we don't know the specs of either of these.

I found RDS to be rock solid too, although performance issues are often resolved by developers by submitting a PR that bumps the instance size x2, because "why not". On baremetal it's often impossible to upgrade CPU just like that, so people have to fix performance issues elsewhere, which leads to better outcome at the end.


RDS works great, but it's far easier to scale a bare metal setup to an extent that makes RDS look like an expensive toy because you have far more hardware options

RDS is a good option if you want convenience and simplicity, though.


Managing database backups myself is something that gives me nightmares. I would refuse to use bare-metal dbs unless I have a dedicated team just to manage the database (or data that is okay to lose, like caching layers).

Managing database backups is fairly straightforward. Postgres + a base backup + long term wal archiving in a blob store is very easy to set up and monitor. It could be easier, and if you don't want to manage that using RDS is certainly a valid choice, but it's a tradeoff - I often have customers that help addressing performance issues with RDS they simply wouldn't have if they sized a bare metal setup with enough RAM and NVMe and configured it even halfway decently instead, and the end result is often that they end up paying more for devops help to figure out performance bottlenecks than they'd spend putting the same devops consultant on retainer ensuring they have a solid backup setup.

I dunno, it does sound like significant work and way outside my (and most devs) area of expertise. I can definitely supervise a managed RDBMS (like RDS) by myself without help on the side even though I am no dba.

A mismanaged VPS is downtime and churn, a mismanaged DB will insta-kill your business if you have unrecoverable data loss. I would definitely use a managed solution until I can get a dedicated person to babysit the DB, but I would consider managing a VPS myself.


There's no need for a dedicated person. A single operator can easily manage dozens of DB instances unless your needs are extremely complex. Managing these kinds of things are serviced trivially available on retainer.

I don't know too much about the performance side of RDS, but the backup model is absolutely a headache. It's at the point where I'd rather pg_dump into gz and upload to s3.

I never had problems with RDS backup, but I never used it a ton or had really large DBs in it. What problems did you ran into?

> performance regressions, heat maps, kernel issues etc.

> AWS takes that away and makes you focus on the product.

ha ha ha no. Have been dealing with kernel issues on my AWS machines for a long time. They lock up under certain kinds of high load. AWS support is useless. Experimenting with kernel version leads to performance regressions.

AWS is great if your IT/purchasing department is inefficient. Getting a new AWS machine is instant, compared to getting purchasing to approve new machine and IT allocating it to you. But all the low-level stuff is still there.


>AWS is great if your IT/purchasing department is inefficient

Fwiw, I think a lot of companies have this problem.


My wife is literally consulting for a big bank where it takes them 6+ months to get an on-premise VM setup and configured.

I think the conversation has turned from "Can we spend more?" to "Can you please try and spend less?"

"Can you please try and spend less?"

"Sure, we can get some on-prem machines. They'll pay for themselves in 6 months. I just need permissions from Finance to spend some CAPEX, and get IT and Facilities to cooperate"

"Ugh, actually please keep using AWS. But try and spend less.. if you can and this does not compromise deadlines"


We have both AWS and colocated servers. The server workload mostly scales with the machine count not the user count. And you can get a lot done with very few servers these days.

I have literally never met a dev who wanted to deal with any kind of infrastructure.

That was barely true a decade ago. It's total nonsense today when it's trivial to ensure all your servers have IPMI or similar and it cann all be automated apart from the couple-of-times a year component swap outs.

But it's also the wrong comparison: there's rarely a reason to go on premises, and need to take responsibility for the hardware yourself - renting bare metal servers is usually the sweet spot and means someone else does the annoying bits for you but you still have the simplicity and lower cost.

As someone contracted to manage systems for people, I consistently make more money from people who overengineer their cloud setups than from people with bare metal servers. It tends to require far more maintenance to keep an AWS setup same, secure, and not bankrupting you.


> While terraform is not ideal it is much much more easy to deal with managed services in AWS than to deal with on premises baremetal servers.

If you have one or two baremetal servers it is not, but yes once you start having a lot of infra it is way better.

But you can get really, really far with one or two baremetal servers and some SaaS solutions...


Uh, no I don't in fact like dealing with any of that. And I've never had to in 20 years of managing some mid scale service on bare metal. (Though of course there have been issues of various sorts.)

I think you may have it backwards: people like tinkering with complex cloud stuff, even if they don't need it.


Those are the ones that also usually tell you you can just stitch together a few SaaS products and it's magic.

It's much the same mindset as: "Vibe-coding can do it for you so you don't have to program"

Yep. Low-effort, shallow knowledge, risk-taking guys.

You are an outdate Boomer!!! I have 37 agents doing that for me!!!!!11^

LOL


I've certainly done some things where outsourcing hosting meant I didn't have to manage infrastructure. For services running on vm instances in gcp vs services running on bare metal managed hosts, there's not a whole lot of difference in terms of management IMHO.

But any infrastructure that the product I support use is infrastructure I need to manage; having it outside my control just makes it that much harder to manage. If it's outside my control, the people who control it better do a much better job than I would at managing it, otherwise it's going to be a much bigger pain.


I do consulting in this space, and I'm torn: I make much more money managing infrastructure from clients who insist on AWS. But it's much more enjoyable to work with people who knows how to keep it simple.

I worked on a project for my company (a low volume basic web app) and I suggested we could just start the whole thing on one server. They brought on some Azure consultants and the project ballooned out to months of work and all kinds of services. I’m convinced most of the consultants were just piling on services so they could make more money.

If you hire hammer experts then you're going to end up using a lot of hammers in your construction. The Azure experts aren't pitching Azure because they're trying to sell more Azure products. They do it because that's all they know and most likely because you don't know it so you'll be likely to come back to them for support when things inevitably need to evolve.

Also, the more you use your cloud vendor's various services in your code, the more subject you are to vendor lock-in.

I won't name any names, but I'm pretty sure this is a big part of the reason why a specific cloud vendor pushed so very hard for us to push a bunch of data into their highly advanced NoSQL big data solution, when the data in question was perfectly happy continuing indefinitely to exist as a few tens of megabytes of CSV files that were growing at a rate of a couple kilobytes per day.


CosmosDB?

It’s okay, this is the Internet, you can name names.


Naming names would not be helpful. The problem isn't any one company's business practices. The problem isn't necessarily even vendor lock-in. The problem is familiarity bias. When all you have is a hammer, everything looks like a nail. And, as a corollary, when you ask a hammer vendor what you should do their answer will always be to treat your problems like nails.

It's probably true. The biggest challenge with doing the right thing in this space is that the sales job is hard, time consuming and so expensive that it's a lot easier to make it profitable if you make projects balloon like that. The sales effort is much the same.

I've been offering to help people cut costs for a while, and it's a shockingly hard sell even with offers of guarantees, so we're deemphasizing it to focus more on selling more complex DevOps assistance and AI advice instead... Got to eat (well, I do much better than that, but anyway), but I refuse to over engineer things just to make more money.


I don't necessarily even blame the contractors. When the bosses look askance at simple solutions what can you do? It's weirdly harder to sell people on something simple than on something complex. They assume the simple solution must be missing something important.

I joke with my boss that all our shit ends up running on a single server in some Amazon data center. It's probably not true but if you add up everything we do it's pretty close to one big server.

It's probably not true, but the end result would be the same even if it was true.

What I’ve always found concerning about managed cases is that the “platform” teams could never explain, in simple terms, how the application was actually deployed.

It was so complex I gave up after a while. That’s never a good sign.


I'll play devil's advocate a little bit here. But to be clear, I hate AWS and all of their crazy concepts and exorbitant pricing, so ultimately I think I'm on your side.

OS maintenance honestly is a bit hard for me. I need to know what to install for monitoring, I need to maintain scripts or Ansible playbooks. I need to update these and make sure they don't break my setup.

And the big kicker is compliance. I always work under SOC2, ISO27001, PCI-DSS, HIPAA, you name it. These require even more things like intrusion detection, antivirus, very detailed logging, backups, backup testing, web application firewall. When you just use AWS Lambda with DynamoDB, the compliance burden goes down a lot.


Yes, you need to write Ansible initially. But honestly, it’s not that much for your average application server. Turn on unattended-upgrades with anything critical to your application blacklisted, and you won’t have to touch it other than to bump version pins whenever you make a new golden image.

Re: compliance, other than SOC2 being a giant theater of bullshit, agreed that it adds additional work. My point is that the claims of “not having to manage infrastructure” is highly misleading. You get to skip some stuff, yes, but you are paying through the nose in order to avoid writing some additional config files.


Working on various teams operating on infrastructure that ranged from a rack in the back of the office, a few beefy servers in a colo, a fleet of Chef-managed VMs, GKE, ECS, and various PaaSes, what I've liked the most about the cloud and containerized workflows is that they wind up being a forcing function for reproducibility, at least to a degree.

While it's absolutely 100% possible to have a "big beefy server architecture" that's reasonably portable, reproducible, and documented, it takes discipline and policy to avoid the "there's a small issue preventing {something important}, I can fix it over SSH with this one-liner and totally document it/add it to the config management tooling later once we've finished with {something else important}" pattern, and once people have been doing that for a while it's a total nightmare to unwind down the line.

Sometimes I want to smash my face into my monitor the 37th time I push an update to some CI code and wait 5 minutes for it to error out, wishing I could just make that band-aid fix, but at the end of the day I can't forget to write down what I did, since it's in my Dockerfile or deploy.yaml or entrypoint.sh or Terraform or whatever.


You have to remove admin rights to your admins then, because scrappy enough DevOps/platform engineers/whatever will totally hand-edit your AWS infra or Kubernetes deployments. I suffered that first hand. And it's even worse that in the old days, because at least back in the day it was expected.

Or at least you have to automatically destroy and recreate all nodes / VMs / similar every N days, so that nobody can pretend that any truly unavoidable hand-edits during emergency situations will persist. Possibly also control access to the ability to do hand edits behind a break-glass feature that also notifies executives or schedules a postmortem meeting about why it was necessary to do that.

I know of at least one organisation that'd automatically wipe every instance on (ssh-)user logout, so you could log in to debug, but nothing you did would persist at all. I quite like that idea, though sometimes being able to e.g. delay the wipe for up to X hours might be slightly easier to deal with for genuinely critical emergency fixes.

But, yes, gating it behind notifications would also be great.


That sounds like the kind of thing that’s amazing, until it isn’t and you know exactly why your day just got a lot worse.

Was this Mozilla?

Oh no it ran out of disk space because of bug! I will run a command on that instance to free it rather than fix bug. Oh no error now happens half of the time better debug for hours only to find out someone only fixed a single instance…

I will never understand the argument for cloud other than bragging rights about burning money and saving money which never shoulda been burning to begin with.


Nah, just run Puppet or similar. You’re welcome to run your command to validate what you already tested in stage, but if you don’t also push a PR that changes the IaC, it’s getting wiped out in a few minutes.

I hate not having root access. I don’t want to have to request permission from someone who has no idea how to do what I want to do. Log everything, make everything auditable, and hold everyone accountable - if I fuck up prod, my name will be in logs, and there will be a retro, which I will lead - but don’t make me jump through hoops to do what I want, because odds are I’ll instead find a way around them, because you didn’t know what you were doing when you set up your security system.


But then your next deployment goes, and it all rolls back, right?

And then it their fault, right?

I might have mild trauma from people complaining their artisanal changes to our environment weren’t preserved.


Weeeeell, if you use Helm the manual change might be preserved, which makes investigations even more... interesting.

In my org nobody has admin rights with the exception of emergencies, but we are ending up with a directory full of Github workflows and nobody knows, which of them are currently supposed to work.

Nothing beats people knowing what they are doing and cleaning up behind them.


I'm still a pretty big fan of Docker (compose) behind Caddy as a reverse-proxy... I think that containers do offer a lot in terms of application support... even if it's a slightly bigger hoop to get started with in some ways.

I'm working on an app server that's auto deploying itself behind Caddy + DNS/SSL aut config. Caddy is amazing, and there really should be no reason for complex setups for most people these days... I've worked on some huge systems, but most systems can run in trivially simple setups given modern hardware.

Have always felt the same.

I’ve seen an entire company proudly proclaim a modern multicore Xeon with 32GB RAM can do basic monitoring tasks that should have been possible with little more than an Arduino.

Except the 32GB Xeon was far too slow for their implementation...


I swear, before I finished reading your comment, this thought jumped into my mind: ‘oh my, they do host everything with a computer similar to my [pretty old by the way, but still beefy] for-work computer! Impressive!’

Which is, I still believe is perfectly possible to do.

Then, I was ‘what?!’


Let me guess: database tables with no indexes, full scans everywhere?

How did they implement it? That's horrendous.

Java - but done so horribly even the devil would be aghast.

There were thousands of threads, murmuring an incessant hum fully occupying a few cores when absolutely nothing was happening. Over 20GB RAM actively used at idle.

I think booting the application took almost half an hour at one point, using a local SSD.

I’m fairly certain at no point in my career could I ever have replicated such a monstrosity.


I totally agree. So much complexity for generally no good reason [0]. I saw so much of this that I ended up starting a company doing the exact opposite. I figured I could do it better and cheaper, so that that's now what we do!

If anyone wants to bail out of AWS et al and onto a few beefy servers, save some money, and gain a DevOps team in the process, then drop us an email (adam at domain in bio).

[0] My pet theory about the real reason: the hyper-scalers hire all the engineers who have the skills to deploy-to-a-few-beefy-servers, and then charge a 10x multiplier for compute. Companies can then choose between impossible hiring, or paying more. Paying more is easier to stomach, and plenty of rationalisations are available.


> My pet theory about the real reason: the hyper-scalers hire all the engineers who have the skills to deploy-to-a-few-beefy-servers, and then charge a 10x multiplier for compute.

This is also my pet theory, and it’s maddening. They’ve successfully convinced an entire generation of devs that physical servers are super scary and they shouldn’t ever have to look at them.


Docker compose on a couple nice VPS’s can do a LOT

On the other hand, I know a lot of people who spend more time / salary messing around with their infra than the couple hundred bucks they've saved from not pressing a couple of buttons on vercel / cloudfare

There's a time and place for just deploying quickly to a cloud provider versus trying to manage your infra. It's a nuanced tradeoff that rarely has a clear winner.


I look at what I can do with an old mac mini (2011) and it’s quite good. I think the only issue with hardware is technical maintenance, but at the scale of a small companies, that would probably be having a support contract with Dell and co.

Small companies should never forget to ask Dell, etc for discounts. The list prices at many of these companies are aspirational and, even at very small scale, huge discounts are available.

I think it depends on what you are optimizing for. If you are a VC funded startup trying to get to product market fit, spending a bit more on say AWS probably makes sense so you can be “agile”. The opportunity cost there might outweigh infrastructure cost. If you are bootstrapped and cost matters a lot, then different story.

You can get a server now with, like, five hundred cores and a fifty terabytes of RAM. It's expensive, but you can get one.

A used server with sixty cores and one terabyte of RAM is a lot cheaper. Couple thousand bucks. I mean, that's still a lot of bucks, but a terabyte for only four digits?


You can get a server now with, like, five hundred cores and a fifty terabytes of RAM. It's expensive, but you can get one.

A used server with sixty cores and one terabyte of RAM is a lot cheaper. Couple thousand bucks. I mean, that's a lot of bucks, but a terabyte for only four digits?


The problem with onsite or colo is always the same. You have to keep fighting the same battle again and again and again. In 5 years when the servers need replaced even though you have already proven it saves orders of magnitude in costs.

I've never once been rewarded for saving 100k+ a month even though I have done exactly that. I have been punished by having to constantly re justify the decision though. I just don't care anymore. I let the "BIG BRAIN MBA's" go ahead and set money on fire in the cloud. It's easier for me. Now I get to hire a team of "cloud architects" to do the infra. At eye bleeding cost increases for a system that will never ever see more than a few thousand users.


What I say is that we massively underestimate just how fast computers are these days

On the other hand, there is a real crossroad that pops up that HNers tend to dismiss.

A common story is that since day one you just have lightweight app servers handling http requests doing 99% I/O. And your app servers can be deployed on a cheap box anywhere since they're just doing I/O. Maybe they're on Google Cloud Run or a small cluster of $5 VPS. You've built them so that they have zero deps on the machine they're running on.

But then one day you need to do some sort of computations.

One incremental option is to create a worker that can sit on a machine that can crunch the tasks and a pipeline to feed it. This can be seen as operationally complex compared to one machine, but it's also simple in other ways.

Another option is to do everything on one beefy server where your app servers just shell out the work on the same machine. This can be operationally simple in some ways, but not necessarily in all ways.


In 2010 I was managing 100 servers, with many Oracle and Postgres DB, PHP, Apache, all on Solaris and Sun HW. I was constantly impressed by how people were unable to do more or less correct estimations. I had a discussion with my boss, he wanted to buy 8 servers, I argued one was more than enough. The system, after growing massively, was still in 2020 managing the load with just 3 servers. So I would argue, not only today, but 15 years ago already.

Most younger devs just have no concept on how limited hardware we ran services on...

I used to run a webmail system with 2m accounts on hardware with less total capacity (ram, disk, CPU throughput) than my laptop...

What's more: It was a CGI (so new process for every request), and the storage backend spawned separate processes per user.


If you know anything about hardware and look at the typical instances AWS is serving up (other than the ludicrously expensive ones) it's Skylake and older.

I think people have a warped perception of performance, if only because the cloud providers are serving up a shared VM on equipment I'd practically class as vintage computing. You could throw some of the same parts together from eBay and buy the whole system with less than a few months worth of the hourly on-demand cost.


Indeed - they are incredibly fast, it's just buried under layers upon layers of stuff

No worries, another fifteen layers of software abstraction will soak that up pronto.

Depending on your regulatory environment, it can be cost-effective to not have to maintain your own data center with 24/7 security response, environmental monitoring, fire suppression systems, etc. (of course, the majority of businesses are probably not interested in things like SOC 2)

This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book. Preferably rent two servers, because reliablity. Once you outgrow renting servers you start renting rack space in a certified data center with all the same amenities. Once you outgrow that you start renting entire racks, then rows of racks or small rooms inside the DC. Then you start renting portions of the DC. Once you have outgrown that you have to seriously worry about maintaining your own data center. But at that point you have so much scale that this will be the least of your worries

> This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book.

What's the bandwidth and where can I rent one of these??


Hetzner [1]. Bandwidth is 1 GBit/s. You can also get 10 GBit/s, that's hidden away a bit instead of being mentioned on the order page [2]

1: https://www.hetzner.com/dedicated-rootserver/matrix-ex

2: https://docs.hetzner.com/robot/dedicated-server/network/10g-...


I have wished for years that Hetzner would offer their bare metal servers in the U.S., and not just Hetzner Cloud.

Here is US Hetzner: https://ioflood.com/

Their prices have come down a lot. I used them when the servers still cost $200 a piece, but their support at the time was fantastic.


Wow. No joke. I haven’t heard of them, but I like their blurb, and those are Hetzner like prices. Now, I just need to find a use for that much beef.

How is that any different from cloud?

This whole thread was a response to

> Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.

suggesting to use a few beefy servers but if we are renting them from cloud we're back where we started.


The difference from the big clouds is that an equivalent instance at AWS costs 10x as much. If you go with few beefy servers AWS offers very little value for the money they charge, they only make sense for "cloud native" architectures. But if you rent raw servers from traditional hosters you can get prices much closer to the amortized costs of running them yourself, with the added convenience of having them in a certified data center with 24/7 security, backup power, etc.

If you want more control than that, colo is also pretty cheap [1]. But I'd consider that a step above what 95% of people need

https://www.hetzner.com/colocation


For me the comparison was not against the specific instance of AWS but cloud in general, and AWS was a for instance. Which was the whole reason why I brought up compliance and stuff—it is much cheaper to have someone else handle that for you (even if it is hetzner!). That was my whole point.

Not ideal when a large part of your userbase is in APAC.

https://us.ovhcloud.com/bare-metal/prices/?display=list

also pretty sure 24 cores is like 48 cloud “cores” which are usually just hyper threads right?


IME, a cloud "core" is even worse than a hyperthread. I'm not sure if they oversubscribe, or underclock, or if it's virtualization overhead... but anyway, not great.

They oversubscribe.

I'm a lot less concerned about CPU and ram and a lot more concerned about replicated object storage (across data centers). High end GPUs are also pretty important.

The only companies directly dealing with that type of stuff are the ones already at such a scale where they need to actually build their own data centers. Everyone else is just renting space somewhere that already takes care of those things and you just need to review their ISO/SOC reports.

This kind of argument comes from the cloud provider marketing playbook, not reality.


This is handled by colo.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: