Just to be aware when you say "Even with all 6 environments and other projects running, the server's resource usage remained low. The average CPU load stayed under 10%, and memory usage sat at just ~14 GB of the available 32 GB."
The load average in htop is actually per CPU core. So if you have 8 CPU cores like in your screenshot, a load average of 0.1 is actually 1.25% (10% / 8) of total CPU capacity - even better :).
Cool blog! I've been having so much success with this type of pattern!
Yes! The varying precisions and maths feels like just the start!
Look at next gen Rubin with it's CPX co-processor chip to see things getting much weirder & more specialized. There for prefilling long contexts, which is compute intensive:
> Something has to give, and that something in the Nvidia product line is now called the "Rubin" CPX GPU accelerator, which is aimed specifically at parts of the inference workload that do not require high bandwidth memory but do need lots of compute and, increasingly, the ability to process video formats for both input and output as part of the AI workflow.
To confirm what you are saying, there is no coherent unifying way to measure what's getting built other than by power consumption. Some of that budget will go to memory, some to compute (some to interconnect, some to storage), and it's too early to say what ratio each may have, to even know what ratios of compute:memory we're heading towards (and one size won't fit all problems).
Perhaps we end up abandoning HBM & dram! Maybe the future belongs to high bandwidth flash! Maybe with it's own Computational Storage! Trying to use figures like flops or bandwidth is applying today's answers to a future that might get weirder on us. https://www.tomshardware.com/tech-industry/sandisk-and-sk-hy...
As a reference for anyone interested - the cost is estimated to be $10 billion for EACH 500MW data center - this includes the cost of the chips and the data center infra.
1) (Mainly) the huge increase in upstream capacity of residential broadband connections with FTTH. It's not uncommon for homes to have 2gbit/sec up now and certainly 1gbit/sec is fairly commonplace, which is an enormous amount of bandwidth compared to many interconnects. 10, 40 and 100gbit/sec are the most common and a handful of users can totally saturate these.
2) Many more powerful IoT devices that can handle this level of attack outbound. A $1 SoC can easily handle this these days.
3) Less importantly, CGNAT is a growing problem. If you have 10k (say) users on CGNAT that are compromised, it's likely that there's at least 1 on each CGNAT IP. This means you can't just null route compromised IPs as you are effectively null routing the entire ISP.
I think we probably need more government regulation of these IoT devices. For example, having a "hardware" limit of (say) 10mbit/sec or less for all networking unless otherwise required. 99% all of them don't need more than this.
> If you have 10k (say) users on CGNAT that are compromised, it's likely that there's at least 1 on each CGNAT IP. This means you can't just null route compromised IPs as you are effectively null routing the entire ISP.
How about we actually finally roll out IPv6 and bury CGNAT in the graveyard where it belongs?
Suddenly, everybody (ISPs, carriers, end users) can blackhole a compromised IP and/or IP range without affecting non-compromised endpoints.
And DDoS goes poof. And, as a bonus, we get the end to end nature of the internet back again.
From having worked on DDoS mitigation, there's pretty much no difference between CGNAT and IPv6. Block or rate limit an IPv4 address and you might block some legitimate traffic if it's a NAT address. Block a single IPv6 address... And you might discover that the user controls an entire /64 or whatever prefix. So if you're in a situation where you can't filter out attack trafic by stateless signature (which is pretty bad already), you'll probably err on the side of blocking larger prefixes anyway, which potentially affect other users, the same as with CGNAT.
Insofar as it makes a difference for DDoS mitigation, the scarcity of IPv4 is more of a feature than a bug.
(Having also worked on DDoS mitigation services) That "entire /64" is already hell of a lot more granular than a single CG-NAT range serving everyone on an ISP though. Most often in these types of attacks it's a single subnet of a single home connection. You'll need to block more total prefixes, sure, but only because you actually know you're only blocking actively attacking source subnets, not entire ISPs. You'll probably still want something signature based for the detection of what to blackhole though, but it does scale farther in a combo on the same amount of DDoS mitigation hardware.
you can heuristically block ipv6 prefixes on a big enough attack by blocking a prefix once a probabilistic % of nodes under it are themselves blocked, I think it should work fairly well, as long as attacking traffic has a signature.
consider simple counters "ips with non-malicious traffic" and "ips with malicious traffic" to probabilistically identify the cost/benefit of blocking a prefix.
you do need to be able to support huge block lists, but there isn't the same issue as cgnat where many non-malicious users are definitely getting blocked.
Presumably a compromised device can request arbitrarily new ipv6 from the dhcp so the entire block would be compromised. It would be interesting to see if standard dhcp could limit auto leasing to guard reputation of the network
Generally, IPv6 does autoconfiguration (never seen a home router with DHCPv6), so no need to ask for anything. Even for ipv4, I've never seen a home router enforce DHCP (even though it would force the public ip).
But the point stands, you can't selectively punish a single device, you have to cut off the whole block, which may include well-behaved devices.
This DDoS is claimed to be the result of <300,000 compromised routers.
That would be really easy to block if we were on IPv6. And it would be pretty easy to propagate upstream. And you could probabilistically unblock in an automated way and see if a node was still compromised. etc.
> That would be really easy to block -- if we were on IPv6.
Make that: If the service being attacked was on IPv6-only, and the attacker had no way to fall back to IPv4.
As long as we are dual-stack and IPv6 is optional, no attacker is going to be stupid enough to select the stack which has the highest probability of being defeated. Don't be naive.
It'd be far more acceptable to block the CG-NAT IPv4 addresses if you knew that the other non-compromised hosts could utilize their own IPv6 addresses to connect to your service.
Some time ago I decided for our site to not roll out ipv6 due to these concerns. (a couple of million visitors per month) We have meta ads reps constantly encourage us to enable it which also do not sit right with me.
Although I belive fingerprinting is sofisticated enough to work without using ip's so the impact of using ipv6 might not be a meaningful difference.
Is there any money an ISP would make, or save, by sinking money and effort on switching to IPv6? If there's none, why would they act? If there is some, where?
For instance, mobile phone operators, which had to turn ISPs a decade or two ago, had a natural incentive to switch to IPv6, especially as they grew. Would old ISPs make enough from selling some of their IPv4 pools?
They already lease them out. TELUS in Canada traditional old ISP rents large portion of their space to a mostly used for Chinese GFW VPN server provider in LA „Psychz“
The ISPs have to submit plans on how to use their IPs for the public,especially for IPv4, Arnic shouldn't approve this kind of stuff. Unless they lied in their ip block application, in which case they should be revoked their block.
I filled out one of these for Cogent to get a /24. I was being honest but all I had to put was services that requires their own IP. I even listed a few but no where near the 253.
They also never responded back and were like "what about NAT" or "what about host based routing".
Not sure what you filled out, but blocks are handed usually not to end users, but to providers that will sublease the ips to their client. So if you are asking for a block for a couple of your HTTP servers, that's a no. If you rent HTTP servers to, say, local small businesses, then that's a yes.
> How about we actually finally roll out IPv6 and bury CGNAT in the graveyard where it belongs?
That depends on the service you are DDosing actually having an IPv6 presence. And lots of sites really don't.
It doesn't help if you have IPv6 if you need to fallback to IPv4 anyway. And if bot-net authors knows they can hide behind CGNAT, why would they IPv6 enable their bot-load when all sites and services are guaranteed to be reachable bia IPv4 for the next 3 decades?
> 3) Less importantly, CGNAT is a growing problem. If you have 10k (say) users on CGNAT that are compromised, it's likely that there's at least 1 on each CGNAT IP. This means you can't just null route compromised IPs as you are effectively null routing the entire ISP.
Null routing is usually applied to the targets of the attack, not the sources. If one of your IPs is getting attacked, you null route it, so upstream routers drop traffic instead of sending it to you.
Haha that last part is pretty wild. rather than worrying about systemic problems in the entire internet let's just make mandates crippling devices that China, where all these devices are made, will defffinitely 100% listen to. Sure, seems reasonable. Systems that rely on the goodwill of the entire world to function are generally pretty robust, after all.
I'm sorry that you find thinking about second order dynamics annoying, but that's what you have to do if you actually want effective laws. Just making laws doesn't magically fix problems. In many cases it just makes much more exciting problems.
I'm annoyed because you didn't actually come up with an interesting response. Yes, when you make laws people can break them. But you need to explain why there is an incentive to break them, and whether it will happen to the extent that it will actually be a problem to enforce. Personally, I don't see people scrambling to get DDoS attack vectors in their house by any means necessary.
> I think we probably need more government regulation of these IoT devices. For example, having a "hardware" limit of (say) 10mbit/sec or less for all networking unless otherwise required. 99% all of them don't need more than this.
What about DDoSs that come from sideloaded, unofficial, buggy, or poorly written apps? That's what IoT manufacturers will point to, and where most attacks historically come from. They'll point to whether your Mac really needs more than 100mbps.
The government is far more likely to figure it out along EU lines: Signed firmware, occasional reboots, no default passwords, mandatory security updates for a long-term period, all other applicable "common sense" security measures. Signed firmware and the sideloading ID requirements on Android also helps to prevent stalkerware, which is a growing threat far scarier than some occasional sideloaded virus or DDoS attack. Never assume sideloading is consensual.
>What about DDoSs that come from sideloaded, unofficial, buggy, or poorly written apps? That's what IoT manufacturers will point to, and where most attacks historically come from.
any source for this claim? Outside of very specific scenarios which differ significantly for the current botnet market (like manjaro sending too many requests to the aur or an android application embedding an url to a wikipedia image) I cannot remember one occourence of such a bug being versatile enough to create a new whole cybercrime market segment.
>They'll point to whether your Mac really needs more than 100mbps.
it does, because sometimes my computer bursts up to 1gbps for a sustained amount of time, unlike the average iot device that has a predictable communication pattern.
>Signed firmware and the sideloading ID requirements on Android also helps to prevent stalkerware, which is a growing threat far scarier than some occasional sideloaded virus or DDoS attack. Never assume sideloading is consensual.
if someone can unlock your phone, go into the settings, enable installation of apps for an application (ex. a browser), download an apk and install it then they can do quite literally anything, from enabling adb to exfiltrating all your files.
Historically, it was called Windows XP and Vista about 15 years ago (Blaster, Sasser, MyDoom, Stuxnet, Conficker?). Microsoft clamped down, hard, across the board, but everyone outside of Big Tech is still catching up.
Despite Microsoft's efforts, 911 S5 was roughly 19 million Windows PCs in 2024, in news that went mostly under the radar. It spread almost entirely through dangerous "free VPN" apps that people installed all over the place. (Why is sideloading under attack so much lately? 19 million people thought it would make them more secure, and instead it turned their home internet into criminal gateways with police visits. I strongly suspect this incident, and how it spread among well-meaning security-minded people, was the invisible turning point in Big Tech against software freedom lately.)
> if someone can unlock your phone, go into the settings, enable installation of apps for an application (ex. a browser), download an apk and install it then they can do quite literally anything, from enabling adb to exfiltrating all your files.
Which is more important, and a growing threat? Dump all her photos once; or install a disguised app that pretends to be a boring stock app nobody uses, that provides ongoing access for years, with everything in real-time up to the minute? Increasingly it's the latter. She'll never suspect the "Samsung Battery Optimizer" or even realize it came from an APK. No amount of sandboxing and permissions can detect an app with a deliberately false identity.
I think there's some exaggeration as few $1 SoC parts come with 10G Ethernet, and >1G to the home is not common, but pretty much any home router can saturate its own uplink - it would be useless if it couldn't!
Not always the case. Generating traffic can be more computationally intense than routing the traffic. I've done speed tests on a few routers local to it and the results have been less than stellar compared to getting expected results with it just routing traffic (consumer routers). Granted these tests were a few years ago and things have progressed, but how often are people upgrading their routers?
Also, most 1Gbit/s and faster routers have hardware-accelerated packet forwarding, aka "flow offloading", aka "hardware NAT", where forwarded packets mostly don't touch software at all.
Some routers even have internal "CPU" port of packet core with significantly slower line rate than that of external ports'. So traffic that terminates/originates at the router is necessarily quite a bit slower, regardless of possibly extra-beefy processor, and efficient software. Not really a problem since that traffic would normally be limited to UI, software updates, ARP/NDP/DHCP, and occasional first packet of a forwarded network connection.
Define most places? I know i dont get one (uk) and neither does my german friend or texan friend.
I've only ever seen one despite having used 4 different ISPs for gigabit, and that one was special. It was in an apartment i rented in a converted office tower, line was done via a b2b provider then included in the rent.
Nope. less than a percent of a percent. symmetric plans are extra cost and offered primarily to business.
almost all homes have no ability to exceed gigabit. infact almost all new homes dont even have data wiring. people just want their netflix to work on wifi.
This is probably technically true but very misleading. Fiber penetration in the US has been consistently rising for over a decade now and it is not at all uncommon to have either Google Fiber, Fios, or a local fiber provider available to you in a big city. I bet within the next decade most places will have gigabit fiber available.
Does it really matter? The grandparent comment states the bandwidth is becoming even more readily available in the US, while the article itself says the bots were largely hosted by US ISPs, and that's obviously enough bandwidth to already cause global disruptions. But that's just the source of the attack, and who is on the receiving end is another.
I get being too US-centric, but I think it's interesting if the US has the right combination of hosting tons of infected devices and having the bandwidth to use them on a much larger scale compared to other countries and possible implications.
Seems more likely that residential modems will be required to use ISP-provided equipment that has government mandated chips, firmware, etc to filter outbound traffic for DDoS prevention.
Most eyeball networks have a lot of inbound traffic and not very much outbound, but interconnections with other networks are almost always symmetric, so there's a lot of room for excess egress before it causes pain to the ISP.
When I ran a large web site that attracted lots of DDoS, it didn't really seem worthwhile to track down the source and try to contact ISPs. I had done a lot of trying to track and stop people sending phishing mail under our name, and it's simply too much work to write a reasonable abuse report that is unlikely to be followed up on. With email, mostly people seem to accept the Received headers are probably true; with DDoS, you'd be sending them pcaps, and they'd be telling you it's probably spoofed, and unless I've got lots of peering, I'm not going to be able to get captures that are convincing... so just do my best to manage the inbound and call it a day.
No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.
Why is having so many bands a bad thing? Demand for data is so much higher now you need (ideally) hundreds of MHz of spectrum in dense areas. You need some way to partition that up as you can't just have one huge static block of spectrum per auction.
The issue with LTE isn't bands, it's the crappy way they have done VoLTE and also seemingly learnt nothing for VoNR.
They should have done something like GET volte.reserved/.well-known/volte-config (each carrier sets up their DNS to resolve volte.reserved to their ims server which provides config data to the phone). It would have given pretty much plug and play compatibility for all devices.
Instead the way it works is every phone has a (usually) hopelessly outdated lookup table of carriers and config files. Sort of works for Apple because they can push updates from one central place, but for Android it's a total mess.
> Why is having so many bands a bad thing? Demand for data is so much higher now you need (ideally) hundreds of MHz of spectrum in dense areas. You need some way to partition that up as you can't just have one huge static block of spectrum per auction.
Because different countries use different sets of bands. That was true for GSM too, but quad band phones were reasonably available. Many phones were at least tri band, so you would at least have half the bands if you imported a 'wrong region' tri-band.
But now, you'll have a real tough time with coverage in the US if you import a EU or JP phone.
With a "quad band" LTE phone of bands 2, 7, 20 and say 12 you would get pretty much worldwide coverage. It'd just be slower because you can't access other ones. Not sure what the issue is?
The issue is the import phones I want to buy don't suppprt those bands. An example phone I might want (Xperia 10 IV) supports 12 bands for LTE, my carrier (US T-Mobile) supports 6, but the intersection is only 2 bands (the old GSM bands) and I know my carrier doesn't always have coverage on those bands. I've got enough dead zones without throwing out 4 bands.
I think the fibre optic analogy is a bad one. The key reason supply massively outstripped demand was that optical equipment massively improved in efficiency.
We are not seeing that (currently) with GPUs. Perf/watt has basically completely stalled out recently while tokens per user has easily increased in many use cases has went up 100x+ (take Claude code usage vs normal chat usage). It's very very unlikely we will get breakthroughs in compute efficiency in the same way we did in the late 90s/2000s for fiber optic capacity.
Secondly, I'm not convinced the capex has increased that much. From some brief research the major tech firms (hyperscalers + meta) were spending something like $10-15bn a month in capex in 2019. Now if we assume that spend has all been rebadged AI, and adjust for inflation it's a big ramp but not quite as big as it seems, especially when you consider construction inflation has been horrendous virtually everywhere post covid.
What I really think is going on is some sort of prisoners dilemma with capex. If you don't build then you are at serious risk of shortages assuming demand does continue in even the short and medium term. This then potentially means you start churning major non AI workloads along with the AI work from eg AWS. So everyone is booking up all the capacity they can get, and let's keep in mind a small fraction of these giant trillion dollar numbers being thrown around from especially OpenAI are actually hard commitments.
To be honest if it wasn't for Claude code I would be extremely skeptical of the demand story but given I now get through millions of tokens a day, if even a small percentage of knowledge workers globally adopt similar tooling it's sort of a given we are in for a very large shortage of compute. I'm sure there will be various market corrections along the way, but I do think we are going to require a shedload more data centres.
> We are not seeing that (currently) with GPUs. Perf/watt has basically completely stalled out recently while tokens per user has easily increased in many use cases has went up 100x+ (take Claude code usage vs normal chat usage). It's very very unlikely we will get breakthroughs in compute efficiency in the same way we did in the late 90s/2000s for fiber optic capacity.
At least for gaming, GPU performance per dollar has gotten a lot better in the last decade. It hasn't gotten much better in the past couple of years specifically, but I assume a lot of that is due to the increased demand for AI use driving up the price for consumers.
Difference is that with fiber you can put more data on same piece of glass or plastic or whatever just by swapping the parts at the end. And those are relatively small part of the cost. Most which is just getting the thing in place.
With GPUs and CPUs. You need to replace entire thing. And now they are the most expensive part of the system.
Other option is doing more with same computing power, but we have utterly failed with that in general...
It's been worse than that. Datacentres are needing basically completely rebuilt for especially Blackwell chips as they mostly require liquid cooling, not air cooling as before. So you don't need to just replace the hardware, you need to replace all the power AND provide liquid cooling, which means completely redesigning the entire datacentres.
Yeah, but the question is whether your demand for Claude Code would be as high as it is, if Anthropic were charging enough to cover their costs. Not this fake "the model is profitable if you ignore training the next model" stuff but enough for them to actually be profitable today.
This is a crucial question that often gets overlooked in the AI hype cycle. The article makes a great point about the disconnect between infrastructure investment and actual revenue generation.
A few thoughts:
1. The comparison to previous tech bubbles is apt - we're seeing massive capex without clear paths to profitability for many use cases.
2. The "build it and they will come" mentality might work for foundational models, but the application layer needs more concrete business cases.
3. Enterprise adoption is happening, but at a much slower pace than the investment would suggest. Most companies are still in pilot phases.
4. The real value might come from productivity gains rather than direct revenue - harder to measure but potentially more impactful long-term.
What's your take on which AI applications will actually generate enough value to justify the current spending levels?
Because the weather is very changeable. You may get a lull in the wind for a couple of mins, enough to land.
I've been on a couple of flights like that. Once where we did two attempts and landed on the 2nd, the other where we did 3 but the had to divert. Other planes were just managing to land in the winds before and after our attempts.
The other problem is (as I found out on that flight) that mass diversions are not good. The airport I diverted to in the UK had dozens of unexpected arrivals, late at night. There wasn't the ground staff to manage this so it took forever to get people off. It then was too full to accept any more landings, so further flights had to get diverted further and further away.
So, if you did a blanket must divert you'd end up with all the diversion airports full (even to flights that could have landed at their original airport) and a much more dangerous situation as your diversions are now in different countries.
Every time I read one of these articles the main issue I have is that it doesn't take into account the huge shortages of compute that are going on all the time. Anthropic and Google especially have been incredibly unreliable, struggling to keep up with demand.
Each of the main providers could easily use 10x the compute tomorrow (albeit arguably inefficiently) by using more thinking for certain tasks.
Now - does that scale to the 10s of GWs of deals OpenAI is doing? Probably not right now, but the bigger issue as the article does point out in fairness is the huge backlog of power availability worldwide.
Finally, AI adoption outside of software engineering is incredibly limited at work. This is going to rapidly change. Even the Excel agent Microsoft has recently launched has the potential to result in hundred fold increases in token consumption per user. I'm also suspect of the AI sell through rate being an indicator that it's not popular for Microsoft. The later versions of M365 copilot (or whatever it is called today) are wildly better than the original ones.
It all sort of reminds me of Apple's goal of getting 1% in cell phone market share, which seemed laughably ambitious at one point - a total stretch goal. Now they are up to 20% and smartphone penetration as a whole is probably close to 90% globally of those that have a phone.
One potential wild card though for the whole market is someone figuring out a very efficient ASIC for inference (maybe with 1.58bit). GPUs are mostly overkill for inference and I would not be surprised if 10-100x efficiency gains could be had on very specialised chips.
the huge demand exists right now because the cost of a token is near zero. and companies have figured out one weird hack to gaining value in the stock market, which is to brag about how many tokens are being crammed into all manner of places that they may or may not belong.
customer value must eventually flow out of those datacenters in the opposite direction to the the energy and capex that are flowing in
do people actually want all this AI? I see studio ghibli portraits, huge amounts of internet spam, workslop... where is the value?
> Each of the main providers could easily use 10x the compute tomorrow (albeit arguably inefficiently) by using more thinking for certain tasks.
That's true for everyone with regard to any resource.
The question is whether the 10x increase in resources results in 10x or more increase in profit.
If it doesn't then it doesn't make sense to pay for the extra resources. For AI right now, the constraint is profit per resource unit, not number or resource units.
"The later versions of M365 copilot (or whatever it is called today) are wildly better than the original ones."
I find AI agents work very poorly within the Microsoft ecosystem. They can generate great HTML documents (because it's an open source format maybe?) but for word documents, the formatting is so poor I'd had to turn it off and just do things manually.
Opposing anecdote: I got consistent performance out of Grok and Qwen (17 providers on Openrouter) throughout the day but Gemini gets slow and dumb at times.
BlackRock did not buy “a” data center, it bought a data center company with 78 data centers. I have no comment on whether or not it was a good deal, but your framing is silly.
Interesting, seems to use 'pure' vision and x/y coords for clicking stuff. Most other browser automation with LLMs I've seen uses the dom/accessibility tree which absolutely churns through context, but is much more 'accurate' at clicking stuff because it can use the exact text/elements in a selector.
Unfortunately it really struggled in the demos for me. It took nearly 18 attempts to click the comment link on the HN demo, each a few pixels off.
The load average in htop is actually per CPU core. So if you have 8 CPU cores like in your screenshot, a load average of 0.1 is actually 1.25% (10% / 8) of total CPU capacity - even better :).
Cool blog! I've been having so much success with this type of pattern!
reply