Listening to Jim Keller talk about engineering makes me feel the same way I used to feel when I listened to Jeff Beck playing guitar. I didn't know if I was inspired or if I should quit.
The first Graviton machines (which were worse than the current 64 core RISC-V SG2042) were installed in November 2018. Four and a half years later are now up to 20% or 25% of total AWS capacity. Even without any acceleration of adoption, it looks like they could be the majority of AWS within 10 years from first deployment.
There seems to be every reason RISC-V could do it faster. Graviton 1 level chips are available right now, Graviton 3 or better coming from multiple companies (including Keller's) in 2024-2025. The software is more critical, but once x86-only things have been made portable to Arm (which can be hard work), further porting them to RISC-V is much much easier.
None of those require user or OS instruction set compatibility for legacy apps that are hard or impossible to recompile.
And most of these applications don't really require gonzo superscalar performance. Add more cores, support more data streams.
If you can eliminate licensing costs for that portion of your fleet, then you only need to expand the ISA compatible portion of your fleet as demanded by paying customers.
As an example, suppose all of a cloud provider's services can migrate to RISC-V. As organic demand for x86 Cloud among customers grows, services can shift incrementally to the cheaper home grown platforms. And since the freed up machines are at least partially depreciated, the cost of these servers is much less than what a customer would pay for new servers on prem. (depreciated Cap-ex, far better Op-Ex).
The interesting question is the transition rate of end customer apps to the new ISA vs the growth rate of locked ISA apps.
Eventually the locked ISA apps portion becomes a lot like the current IBM mainframe business. Very valuable to a very small number of customers.
The only counter for this is if x86 can crank performance per $TCO so far that the non-x86 branch can't compete in business terms, which has historically been the issue with ARM.
… and fully managed cloud services (serverless databases, API gateways, messaging components, supplementary services etc etc). Which is where the hardware replacement is likely most frantic but impossible to glean into without having the insider knowledge. And since the fully managed services are charged on the usage basis, it is possible to utilise the hardware more efficiently under the hood and completely transparently to the end user (if properly architected) as opposed to spinning up cloud VM's.
> The only counter for this is if x86 can crank performance per $TCO so far that the non-x86 branch can't compete in business terms, which has historically been the issue with ARM.
If we take AWS for example, isn't the performance per TCO better of an Arm-based Graviton instance better than x86? I don't think the historical issue you cite represents the future.
We know what they are selling it for, but that isn't the same.
True TCO needs to include the cost to develop the chip - after all, that is folded into the x86 price.
If you assume that the Graviton project is $250M per chip design for the 3 iterations, and the online estimates of 1 million chips is accurate, then you need to add about $750 per CPU, beyond the probably $250 per chip fab'ed and packaged.
I think you are overestimating the value and amount of help from ARM, especially in the more recent Graviton generations, but of course I don't know Amazon's actual chip cost profile.
This man knows exactly what he is talking about. He was responsible for designing the original AMD Athlon 64. He worked at apple to transition from Generic Samsung ARM SoC to their own Apple silicon which is the base for modern M1 Apple silicon. He worked for Intel (we'll see his work in Lunar Lake, Jim Keller's Royal Core Project). And most importantly he worked again at AMD and gave us Zen architecture.
That's not exactly my takeaway, e.g. he says this in that interview which is pretty consistent:
> So if I was just going to say if I want to build a computer really fast today, and I want it to go fast, RISC-V is the easiest one to choose. It’s the simplest one, it has got all the right features, it has got the right top eight instructions that you actually need to optimize for, and it doesn't have too much junk.
Yes. Because it has less legacy, it is easier to build for if you are going for on scratch. The point is most code is the same handful of instructions.
My optimistic take: “All the data centers” are the Cloud hyperscalers, who are increasingly delivering value through PaaS/SaaS vs. raw VMs and IaaS.
They’re choosing the CPUs they like best, can turn over quickly if it’s worthwhile, and if the performance/economics of RISC-V are suitably appealing will do so.
I wonder how much of, say, S3’s infrastructure is running on Graviton?
I hope RISC-V servers will come with open BIOS, given Ron Minnich stance on the proprietary BIOS issue during his time at google I think hyperscalers would like that too.
Recently had basic BIOS/BMC bugs it's annoying as hell.
Data centers are one of the best demographics for adopting new architectures because more of the software can be custom-built towards a narrow application: Get a Linux stack to build, add some network I/O, add some virtualization, and you can do all sorts of things.
Client apps have a much harder time making that jump because the environment is more holistic, the hardware more varied, and the need for "must-have" proprietary apps more imperative.
> Data centers are one of the best demographics […]
Hardly. Data centers are a dying breed, and their number has been rapidly dwindling in recent years. DC (and the mythical «on-prem» by extension) has effectively become a dirty word in contemporary times. The most brutal lift-and-shift approach (without discussing the merits of doing so) is most common: create a backup, spin up a cloud «VM», restore from the backup and turn the server in the DC off forever. No-one is going to even remotely consider a new hardware architecture, not even in the cloud.
Moreover, since servers do not exist in a vacuum and either run business apps or do something at least remotely useful, that entails the software migration to adopt the new platform. And the adoption has to be force-pushed onto the app developers otherwise they won't bother, and for them to convert/migrate the app onto a new architecture, they need desktops/laptops that run on the new ISA, and no viable server and desktop hardware exists in June 2023 – it will come along later with «later» not having a clear definition. Talking open source is a moot point as most businesses out there run commercially procured business apps.
Data centers in general are NOT a dying breed, and it's more a case of rapidly growing, not dwindling. Perhaps you are referring to individual companies moving to the cloud, and colo type activity (albeit institutions with strict regulation may still require a backup colo) dwindling?
However, the cloud resource providers are definitely growing (https://www.statista.com/outlook/tmo/data-center/worldwide#r...), and there is a huge push for more power and heat efficient architecture, whether on the server/network/supporting infrastructure side.
This doesn’t seem to comport with Amazon’s experience, investment, and trajectory with Graviton, based on public reference customers and a few personal anecdotes.
They are, but they are not data centers in the traditional sense of the term. The GP was referring to the traditional data centers as far as I understand.
> You're paying a x10 markup to make accounting shenanigans easier,
Whilst cloud platforms do allow one to accrue an eye-watering cloud bill by virtue of shooting oneself with a double-barelled gun, the fault is always on the user and the «10x markup» is complete bonkers and is a fiction.
As an isolated random example, API gateway in AWS serving 100 million requests 32 kB each with at least 99.95% SLA will cost US$100 a month. AWS EventBridge for the same 100 million monthly events with at least 99.99% availability will also cost US$100 a month.
That is US$200 in total monthly for a couple of the most critical components of a modern data processing backbone that scales out nearly indefinitely, requires no maintenance nor manual supervision and is always patched up security wise and is shielded from DDoS attacks. Compared to the same SLA, scalability and opex costs in a traditional data centre, they are a steal. Again, we are talking about at least 99.95% and 99.99% SLA for each service.
If one uses the cloud to spin up cloud VM's and databases that run 24x7 and result in an averags 10% monthly CPU utilisation, they are cooking the cloud wrong, they are wasting their own money and they are the only ones to blame the 10x markup that is a delusion caused by ignorance.
> but the technology is exactly the same.
The underlying technology might be the same, but is abstracted from the user who can no longer care about it, use a service and pay for the actual usage only. The platform optimises the resource utilisation and distribution automatically. That is the value proposition of the cloud today and not 15 years ago.
> Go compare prices of e.g. Hetzner or OVA and come back to me again with that "fiction".
I have given two real examples of two real and highly useful fully managed services with made-up data volumes along with their respective costs. Feel free to demonstrate which managed services API gateway and pub/sub services Hetzner or OVA have to offer that come close or the same, functionality and SLA wise, – to compare.
> That's only about 35 events per second.
Irrelevant. I am not running a NASDAQ clone, and most businesses do not come anywhere close to generating 35 events per second anyway. If I happen to have a higher event rate, the service will scale for me without me lifting a finger. Whereas if a server hosted in a data centre has been underprovisioned, it will require a full time ops engineer to reprovision it, set it up and potentially restore from a backup. That entails resource planning (a human must be available) and time spent on doing it. None of that is free, especially operations.
> […] Hosting over at Hetzner will cost you maybe $25 a month.
It is the «maybe» component that invalidates the claim. Other than «go and compare it yourself» and hand-waving, I have seen slightly less than zero evidence as a counter-argument so far.
Most importantly, I am not interested in hosting and daily operations, whereas the business is interested in a working solution, and the business wants it quickly. Hosting and tinkering with, you know, stuff and trinkets on a Linux box is an antithesis of the fast delivery.
The vast majority of servers in data centers idle by most of the time anyway consuming electricity and generating pollution for no-one's gain so the argument is moot.
It isn't 1992 anymore, people don't "tinker", they have orchestration in 2023.
The orchestration tools for self-hosted are cheaper, more standard and more reliable. (Because Amazon's and Google's stuff is actually built on top of standard stacks, except with extra corporate stupidity added.)
Regardless of whether you use something industry standard or something proprietary, you will need to have an ops team that knows orchestration. (And an AWS orchestration team will be more expensive, because, again, their stuff is non-standard and proprietary.)
There are reasons for using AWS, but cost or time to market is never one of them.
Considering modern processors spend 4-5 years in development before public release, someone would have to be building the game changing RISC-V CPU right now.
Maybe he meant that development on RISC-V CPUs would start in earnest in the next 5-10 years?
>Considering modern processors spend 4-5 years in development before public release, someone would have to be building the game changing RISC-V CPU right now.
And they are.
Tenstorrent is working on Ascalon. Wei-han Lien (lead architect of M1 at Apple) is the lead architect. Ascalon is a RISC-V microarchitecture expected to be released in 2024, with similar performance to projected AMD Zen5 (also 2024), but lower power consumption.
Ventana Veyron is due late 2023. A very high performance server chip AIUI implementing RVA22+V.
Rivos has been working on something RISC-V, with a very strong team, for several years now.
SiFive's next iteration of high performance CPUs is expected to be strong.
Alibaba group has something in the works, too.
And this is all just the tip of the iceberg. There's way more known projects ongoing, and even more that we do not know of.
The reason that processors take so long to develop, might have something to do with the complexity of the ISA, most of CPU development effort is spent on verification, if you have a simpler ISA, I would imagine that it makes verification easier.
I didn't downvote you; I think you make a reasonable, albeit exaggerated, point. I think it's also important to look through the lens that a lot of things linux supports are reverse engineered and that's why they take a long time to implement. This is de facto different with everything being open so I expect support will come faster. There's also the fact that this aligns better with the ideologies of a lot of free software enthusiasts so they may be more likely to work on it.
My interpretation of 'take over' would be a majority of new server installs would be RISC-V based. There is a lead time for development, orders etc plus customers have to be content to switch to a new architecture. Amazon's Arm program started what 6ish years ago and they are at 20% installs (from my recollection).
Still awaiting the retail release of that 75w PCIe card with out-of-box PyTorch support. If it can run, say StableDiffusion / LAAMA and family, many more use cases shall emerge.
Obviously retail price will have to be competitive with Gaming Cards.
We're having great luck running Yolo 8 models on cheap intel n5105 boxes. 8gb ram, 256gb SSD, USB3 for ~$150... Ubuntu running ONNX gets us 600ms per frame. Not great, but good enough for now.
I watched the video, apparently they are targeting $1000-$2000, and said even $500 sounds interesting.
Maybe, just maybe a highly cut down version, 15 watt model could bring it down to RaspberryPi level price. $45. UsbC interface, if not x4 PCIe. Like, people buy it first, and decide what to do with it latter on. Guess I am going to far down the fantasy lane. ;)
Almost certainly not. Raspberry pi economics only work since they used a cheap of the shelf CPU. For the first few gens, I highly doubt anything below $200 will make any sense given the unit profit margins they will need.
I liked the part in which he pointed out that on the Linux kernel, he could get a fix within an hour, while on NT he had to wait a year. His general emphasis on open source is very encouraging. I didn't know he came around like that.
Yes, "source code for chips / custom hardware" as a product is mostly referred as IPs. This includes HDL/RTL, netlists and GDS layout for a specific process, in increasing order of closeness to physical reality.
Tangentially related, it is interesting to note that layouts and masks for manufacturing custom chips are not really copyrightable, so there is a thing named "mask rights" and (M) instead of (C) for them.
Yeah they're talking about hardware designs. Chip designers can buy "IP" from vendors for stuff they don't want to implement themselves, e.g. PCIe interfaces, embedded CPUs, etc. It's delivered as SystemVerilog code, probably obfuscated and maybe encrypted (yeah the encryption is theoretically worthless but I guess it is a sort of honesty / deniability thing).
I have a list of some I'm keeping tabs on, I'll share it here. I do it out of a combination of personal obsession plus because I want to source angel investments.
Note though that I'm very biased toward AI companies...
Most established, clear product-market fit:
- OpenAI
- Midjourney
- Character ai
- Runway ML
Ones that are interesting:
- Adept AI
- Modal, Banana.dev
- new.computer
- Magic.dev
- Modular (Mojo)
- tiny corp
- Galileo
- Hippo ML
- Tenstorrent
- contextual.ai
- Chroma
- e2b.dev
- Steamship
- Patterns.app
- GGML
Ones that I want to learn more about before deciding:
- Inflection AI
- GetLindy
- Embra
- Jam.dev
- Vocode.dev
That's about 50% of my list. Happy to clean up the rest and write a post if there's interest
Anthropic feels super underrated.
From my experience, Claude+ is on par with GPT-4, 100k context model is amazing, but because they don't have a product exposed to the public, they don't have to burn billions of $ on things like ChatGPT.
Also there's a chance they might avoid some cases of direct regulation and local bans since they are under media radar.
Sally Foxton-Ward (EETimes) and Ian Cuttress (TechTechPotato) started a podcast (only 2 episodes so far I think) listing through companies and what they do. I agree it is very hard to follow with companies coming and going.
I wonder if it's not so much that Jim is so super smart that he can predict the future, or rather that he has such a huge network of peers in which he talks to and from that access to industry insider information can draw more accurate conclusions about the overall direction of the industry than most.
I noted how, when they first started, they talked mostly about developing 'AI accelerators' and it felt mostly like they were talking about big, GPGPU style chips to go head to head with nVidia. Thousands of small SIMD cores doing matrix multiplies, with fast memory and pcie. Maybe something halfway between Cerebras size and and Nvidia Hopper. A tall order but something really needed.
Then at some point it feels like Jim got hooked on the idea of RISC-V everything, and they pivoted their messaging to talking about these more CPU-like chips with a main R-V64 8-wide decoding, state of the art OoO execution, etc. That sounds more like a RISC-V competitor to AMD Zen instead of a competitor to an nVidia GPU.
And they they talk about that just being the interface to the AI chip later but... it really feels like that saw 'hey we can get all this RISC-V stuff for free essentially, and really take over the development of the spec, and that is easier than figuring out how to develop an GP AI chip and a stack that competes with CUDA to go with it, so that is easier to start...'
I'm totally a non-expert though and the preceding is just what I've picked up from watching interviews with Jim (who I just find awesome to listen to).
In the interview he runs down the issues they encountered going down the pure AI accelerator path. It sounds like they've decided the opportunity wasn't there (i.e. too hard) so they've pivoted.
It makes more business sense to have more general purpose hardware that can be pivoted to other applications. Lots of AI ASIC vendors are going to go belly up in the coming years as their platforms fail to attract customers. Carving out a tiny niche with limited demand and no IP moat is very risky in the IC world.
Fast CPU performance is necessary for AI workloads too. You need a fast CPU and combine that with lots of Vector or Tensor processing. Lots of applications need both. They have done both for a while.
Logic is simple inline with reducing power draw from a simplistic instruction set.
Move into a space where we have rapid manufacturing for Specialized chips. Alongside the concept inherit to Nvidia's DPU and you have something very Interesting.
I think you are oversimplifying. You've chosen arbitrary abstraction layer. It's like saying it's just transistors again. Or it's just machine learning. It matters what they can deliver and Jim's track record is best in the whole industry. I think they have a pretty good understanding what's the best approach to bring outstanding results given technology available plus I think currently they want to deliver something to get connected with decent clients and then be able to optimize for actual real world use cases.
I'd love to see chips made out of millions of small computing blocks, neuron alike, without a common clock, with local memory, maybe even with some analog electronics involved. But I'm pretty sure people who are actually working on this kind of stuff could provide me a list of reasons why it's silly (at least given current technology limitations).
sounds like a variation of that old MIT startup called Tilera, very lightweight CPUs with high performance interconnect fabric. at that time I remember thinking it was a solution looking for a problem.
IDK whats wrong with that architecture for AI/ML but I feel too much overhead in full on CPUs. I guess thats where lightweight risc cores come in. Personally what I'd like to see is a clever architecture utilizing a grid architecture with a stack based process for communicating with local nodes using a bare minimal language like forth so extremely light 'nodes' to do matrix math & nothing else.
I think boring can be good, I'm not an expert but 'extra units copy and pasted in a grid' is exactly how I imagine a hardware AI accelerator to be like
I agree, and I've wanted a grid of 1000+ cores for 25 years now, once I realized that the only bottleneck in computing is the bus between CPU and memory back in the late 90s. The only chip that comes even close to what I want for a reasonable price is Apple's M1 line, but they added GPU and AI cores which defeat the purpose.
The churn with GPUs and now AI cores is too much for me. I just can't waste time manually annotating the code I want to run concurrently, on GPU, on AI core, whatever. To me, it looks like everyone is putting 10-100 times more work into their code than what should be needed. I see the same pattern repeated with web development and the explosion of walled garden platforms for mobile and smart devices. So much work for so long for so few results.
Just give me a big dumb boring grid of cores and a self-parallelizing language to program them. Stuff like Julia/Clojure/MATLAB/Erlang/Go come close, but each have poison pills that make reaching the mainstream untenable. Someday I want to write a language that does what I need, but that day will likely never come, because every day is Quantum Leap for me, I just go to work to make rent, pushing the rock up that hill like Sisyphus, only to watch it roll down and have to start all over again. Inadequate tooling has come to totally dominate every aspect of my work, but there may never be time to write better stuff.
We need operating systems designed to make the resources easily accessible across a network. What we are running today are mainframe operating systems where one computer does all the work for concurrent users.
Using plan 9 has taught me that we are far from done designing computers and operating systems - we're trying to build the future on obsolete machines running obsolete operating systems.and it's not going well given all the ugly mutually incompatible nonsense taped and bolted on to hide their age.
Loosely, pretty much all languages today have some form of: mutability, async, special annotations needed for code that should run concurrently or on the GPU, manual memory management, friction around binding to other languages, ugly or unnecessary syntax, the list of poison pills is endless.
I've barely learned portions of Julia, but it does have mutable variables. Once there's a mutable variable, the rest of the language is basically equivalent to transpiling an imperative language like Javascript. As in, there's little advantage over just using higher-order methods within an imperative language, because unexpected behavior can't be prevented, and code can't be statically analyzed beyond where the monad might change.
Clojure uses Lisp's prefix syntax with () with no infix or postfix format available, forcing everyone from a C-style background or even a math background to manually convert their notation to prefix. MATLAB uses 1-indexed arrays, and I don't think that there's a config option to make them 0-indexed. Erlang is too esoteric along a number of design choices to ever be mainstream, although its Actor model is great. Go has trouble with exceptional behavior and doesn't isolate variables between threads, negating most of its usefulness for concurrency.
Basically what I'm looking for is more like a spreadsheet in code, with no imperative behavior whatsoever. I believe that ClojureScript comes closest by executing functionally without side effects, then allowing imperative behavior while suspended, to get/send IO and then execute again. This most closely matches the way the real world works. Unfortunately languages like Haskell go to great lengths to hide the suspended side of things, making it somewhat of a write-only language like Perl (unreadable to outsiders) and too opaque for students to learn in a reasonable amount of time.
The closest language I've encountered that got it almost right was PHP before version 5, when it passed everything by value via copy-on-write and avoided references. Most of the mental load of imperative programming is in reasoning about mutable state. Once classes passed by references arrived, it just became another variation of Javascript. With more syntax warts and inconsistent naming. But since it's conceptually good and mainly ugly in practice, it's still probably my favorite language so far. React and Redux go to great lengths to encapsulate mutable state, but end up with such ugly syntax or syntactic sugar that little is gained over sticking to HTML with classic Javascript handlers sprinkled in. In which case it's better to go with htmx and stay as declarative as possible.
I tried to keep this brief, but I could blabber on about it forever hah!
Well, yeah, back in 1982: "The Berkeley RISC project delivered the RISC-I processor in 1982. Consisting of only 44,420 transistors (compared with averages of about 100,000 in newer CISC designs of the era), RISC-I had only 32 instructions, and yet completely outperformed any other single-chip design" (https://en.wikipedia.org/wiki/Reduced_instruction_set_comput...)
Which was followed by x86 chips being used in nearly all PCs for forty years ...
> Don’t forget his stint at Intel. I met him there and aside from his brilliance, the other thing that stuck out to me is how short he is.
I downvoted you because I have no idea how commenting on his height is significant to this discussion. In fact calling out a physical characteristic that is irrelevant is generally to make-fun. I'm not saying that's what you were doing here, but it's just totally irrelevant.
It wasn't an attempt to make fun of but I can see how you and others would think so. I simply thought it was an interesting characteristic of him because there is a trend for CEOs to be taller than average.