I'm quite surprised the A100 is not much better since the power levels for the Ampere cards I believe is a lot lower.
Does this mean even for a model that fits on a single server that trains for a few weeks will absolutely need a recovery process? Interested in peoples experiences around this.
GPU servers always have had crap reliability compared to a normal server (but sticking eight GPUs on a baseboard complicates things). As I understand it (not my domain), this (being a lack of widespread checkpointing and mpift support) is one of the motivating factors for why ML toolkits eschew MPI (besides accelerator-accelerator being an afterthought).
Exactly, these markets exist in the real world, so as their size and use increases, the more likely the odds will influence real world events. Look at sports betting for a much smaller example. Match fixing is known. Electricity markets are gamed for individual profits at the detriment to everyone and the stability of the system, even with regulators trying to keep things stable. Enough "Market for all the things" already..
See, there are two major flavours of pro-market attitudes. The first one is "if we allow many independent individuals to try their own approaches to a problem and let the people with "better" approach to personally profit from it handsomely and make them compete against each other in an environment with objective-ish judgement of "what is better" instead of "impress the (inevitably corruptible) officials to be judged victorious and awarded the fortunes", and also manually guard and regulate against several universally known ways to sabotage such competition, then we'll be able to channel human ingenuity into solving difficult to solve technical problems while also rewarding those who are able to come up with (and implement) such solutions with low overseeing overhead". Of course, such an attitude isn't strictly speaking "pro-market", it's been around since ancient times; hell, the USSR of all places had this attitude in spades until about the 70s or so.
The second one is "Nah, we don't have to try and think about anything ourselves, just let people fend for themselves, they'll figure it out, and it won't have any unforeseen bad side effects, why would it; markets are magical like that!" Yeah, about that...
Right, a market is a small tool of larger systems. That’s fine, hard to get right but can make systems better. Type two just seems to be the cargo culted everywhere..
I think it's actually not that hard to get right (or at least "right enough"), as evidenced by the fact that markets have successfully run the entire global economy for thousands of years with no central oversight and almost no regulation.
Markets failures do happen, and when they do it can be helpful to have an external force step in to nudge the market back onto the rails. But even without such interventions they work remarkably well on balance.
> markets have successfully run the entire global economy for thousands of years
Markets, as they are understood today, are more like about 300 years old, even less in some places. The bulk of world economy has been sustenance farming for most of the human history, with some communal mutual help (based on favours and mutual indebtedness) thrown in.
> with no central oversight and almost no regulation
Lol what? E.g. Roman Republic (and later empire) tightly regulated its markets, especially food trade, during all of its existence.
The industrial revolution multiplied the size of the global economy by several orders of magnitude, but it didn't create it. International trade has been happening on a smaller scale since large-scale civilized society existed. And I said "almost no regulation", not "no regulation". Probably something on the order of 99% of the regulations we have today wouldn't have even been feasible to enforce a couple hundred years ago, yet markets still functioned just fine on the whole.
The type of people who have the power to change decide these type of events already are able to use that power to make money in a thousand different ways. These markets will change nothing.
Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive.
Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock.
You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time.
This happens when you get worse and worse inequality when it comes to buying power. The most accurate prediction into how this all plays out I think is what Gary Stevenson calls "The Squeeze Out" -> https://www.youtube.com/watch?v=pUKaB4P5Qns
Currently we are still at the stage of extraction from the upper/middle class retail investors and pension funds being sucked up by all the major tech companies that are only focused on their stock price. They have no incentive to compete, because if they do, it will ruin the game for everyone. This gets worse, and the theory (and somewhat historically) says it can lead to war.
Agree with the analysis or not, I personally think it is quite compelling to what is happening with AI, worth a watch.
Totally true. I have a trusty old (like 2016 era) X99 setup that I use for 1.2TB of time series data hosted in a timescaledb PostGIS database. I can fetch all the data I need quickly to crunch on another local machine, and max out my aging network gear to experiment with different model training scenarios. It cost me ~$500 to build the machine, and it stays off when I'm not using it.
Much easier obviously dealing with a dataset that doesn't change, but doing the same in the cloud would just be throwing money away.
> The second you turn your head though, your fellow teammates will conspire to replatform onto Go or Rust or NodeJS or GitHub Actions and make everything miserable again.
Curious how would you use use Smalltalk in replace of GitHub Actions assuming you need a GitHub integrated CI runner?
All any build toolkit is is automations over bash. You can make your own. GitHub integration need not be any more than the most trivial thing that works. Your coworkers, naturally, won't be disciplined enough to keep the integration trivial and will build super complicated crap that's realty hard to troubleshoot, because they can.
I have a hard time trying to conceptualize lossy text compression, but I've recently started to think about the "reasoning"/output as just a by product of lossy compression, and weights tending towards an average of the information "around" the main topic of prompt. What I've found easier is thinking about it like lossy image compression, generating more output tokens via "reasoning" is like subdividing nearby pixels and filling in the gaps with values that they've seen there before. Taking the analogy a bit too far, you can also think of the vocabulary as the pixel bit depth.
I definitely agree replacing AI or LLMs with "X driven by compressed training data" starts to make a lot more sense, and a useful shortcut.
You're right about "reasoning". It's just trying to steer the conversation in a more relevant direction in vector space, hopefully to generate more relevant output tokens. I find it easier to conceptualize this in three dimensions. 3blue1brown has a good video series which covers the overall concept of LLM vectors in machine learning: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_...
To give a concrete example, say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer? By adding more relevant tokens (honey, worker, hive, beeswax) we steer the token generation to the place in the "word cloud" where our next token is more likely to exist.
I don't see LLMs as "lossy compression" of text. To me that implies retrieval, and Transformers are a prediction device, not a retrieval device. If one needs retrieval then use a database.
> You're right about "reasoning". It's just trying to steer the conversation in a more relevant direction in vector space, hopefully to generate more relevant output tokens.
I like to frame it as a theater-script cycling through the LLM. The "reasoning" difference is just changing the style so that each character has film noir monologues. The underlying process hasn't really changes, and the monologues text isn't fundamentally different from dialogue or stage-direction... but more data still means more guidance for each improv-cycle.
> say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer?
I'd like to point out that this scheme can result in things that look better to humans in the end... even when the "clarifying" choice is entirely arbitrary and irrational.
In other words, we should be alert to the difference between "explaining what you were thinking" versus "picking a firm direction so future improv makes nicer rationalizations."
It makes sense if you think of the LLM as building a data-aware model that compresses the noisy data by parsimony (the principle that the simplest explanation that fits is best). Typical text compression algorithms are not data-aware and not robust to noise.
In lossy compression the compression itself is the goal. In prediction, compression is the road that leads to parsimonious models.
The way I visualize it is imagining clipping the high frequency details of concepts and facts. These things operate on a different plane of abstraction than simple strings of characters or tokens. They operate on ideas and concepts. To compress, you take out all the deep details and leave only the broad strokes.
It is not a useful shortcut because you don't know what the training data is, nothing requires it to be an "average" of anything, and post-training arbitrarily re-weights all of its existing distributions anyway.
> In general once you start thinking about scaling data to larger capacities is when you start considering the cloud
What kind of capacities as a rule of thumb would you use? You can fit an awful lot of storage and compute on a single rack, and the cost for large DBs on AWS and others is extremely high, so savings are larger as well.
Well, if you want proper DR you really need an off-site backup, disk failover/recovery, etc. And if you don’t want to manually be maintaining individual drives then you’re looking at one of the big, expensive storage solutions with enterprise grade hardware, and those will easily cost some large multiple more than whatever 2U db server you end up putting in front of it.
Same setup here, one game setup I've hit but this will be a rare problem, is StarCraft Remastered. Wine has an issue with audio processing which I can't seem to configure my way out of. It pegs all 32 threads and still stutters. Thankfully this game can likely run on an actual potato, so I have a separate mini PC running windows for this when I want to get my ass kicked on battle.net.
Working at IT places in the late 2000s, it was still pretty common place for there to be a server rooms. Even for a large org with multiple sites 100s of kms a part, you could manage it with a pretty small team. And it is a lot easier to build resilient applications now than it was back then from what I remember.
Cloud costs are getting large enough that I know I’ve got one foot out the door and a long term plan to move back to having our own servers and spend the money we save on people. I can only see cloud getting even more expensive, not less.
There is currently a bit of an early shift back to physical infra. Some of this is driven by costs(1), some by geopolitical concerns, and some by performance. However, dealing with physical equipment does introduce a different set (old fashioned, but somewhat atrophied) set of skills and costs that companies need to deal with.
(1) It is shocking how much of a move to the cloud was driven by accountants wanting opex instead of capex, but are now concerned with actual cashflow and are thinking of going back. The cloud is really good at serving web content and storing gobs of data, but once you start wanting to crunch numbers or move that data, it gets expensive fast.
In some orgs the move to the cloud was driven by accountants. In my org it was driven by lawyers. With GDPR on the horizon and murmurs of other data privacy laws that might (but didn't) require data to be stored in that customer's jurisdiction, we needed to host in additional regions.
We had a couple rather large datacenters, but both were in the US. The only infrastructure we had in the EU was one small server closet. We had no hosting capacity in Brazil, China, etc. Multi-region availability drove us to the cloud - just not in the "high availability" sense of the term.
> I can only see cloud getting even more expensive, not less.
When you have three major hyperscalers competing for your dollars this is basically not true and not how markets work...unless they start colluding on prices.
We've already seen reduction in prices of web services costs across the three major providers due to this competitive nature.
And it’ll be so good and cheap that you’ll figure “hell, I could sell our excess compute resources for a fraction of AWS.” And then I’ll buy them, you’ll be the new cloud. And then more people will, and eventually this server infrastructure business will dwarf your actual business. And then some person in 10 years will complain about your IOPS pricing, and start their own server room.
Does this mean even for a model that fits on a single server that trains for a few weeks will absolutely need a recovery process? Interested in peoples experiences around this.