Hacker Newsnew | past | comments | ask | show | jobs | submit | EdNutting's commentslogin

Without seeing the actual differences in the bytecode it will be hard to tell what’s really going on. From my experience with other JITs, I’d expect the situation to be something like:

A) Without the typecast, the compiler can’t prove anything about the type, so it has to assume a fully general type. This creates a very “hard” bytecode sequence in the middle of the hotpath which can’t be inlined or optimised.

B) With the typecast, the compiler can assume the type, and thus only needs to emit type guards (as suggested in this thread). However, I’d expect those type guards to be lifted through the function as far as possible - if possible, the JIT will lift them all the way out of the loop if it can, so they’re only checked once, not on every loop iteration. This enables a much shorter sequence for getting the array length each time around the loop, and ideally avoids doing type/class checks every time.

This would avoid pressuring the branch predictor.

Most JITs have thresholds for “effort” that depend on environment and on how hot a path is measured to be at runtime. The hotter the path, the more effort the JIT will apply to optimising it (usually also expanding the scope of what it tries to optimise). But again, without seeing the assembly code (not just bytecode) of what the three different scenarios produce (unoptimised, optimised-in-test, optimised-in-prod) it would be hard to truly know what’s going on.

At best we can just speculate from experience of what these kinds of compilers do.


It's speculation because the author didn't show the byte code or even just what the code decompiles to in Java.

But even with speculation, it shouldn't be that surprising that dynamic dispatch and reflection [0] are quite expensive compared to a cast and a field access of the length property.

[0] https://bugs.openjdk.org/browse/JDK-8051447


These are both great points. When I wrote the post, it didn't occur to me that I could inspect the emitted bytecode. In hindsight, including that would have made the explanation much stronger.

To be honest, this is my first time really digging into performance on a JIT runtime. I learned to code as an astronomy researcher and the training I received from my mentors was "write Python when possible, and C or Fortran when it needs to be fast." Therefore I spent a lot of time writing C, and I didn't appreciate how aggressively something like HotSpot can optimize.

(I don't mean that as a dig against Python; it's simply the mental model I absorbed.)

The realization that I can have really good performance in a high-level language like Clojure is revolutionary for me.

I'm learning a ton from the comments here. Thanks to everyone sharing their knowledge -- it's genuinely appreciated.


> The realization that I can have really good performance in a high-level language like Clojure is revolutionary for me.

I should try it out some time. The Lisp family takes a bit of a mental reset to work with, but I've done it before.

> ...the training I received from my mentors was "write Python when possible, and C or Fortran when it needs to be fast."... (I don't mean that as a dig against Python; it's simply the mental model I absorbed.)

Well, you know, I've been using Python for over 20 years and that really isn't a "dig" at all. The execution of Python is famously hard to optimize even compared to other languages you might expect to be comparable. (Seriously, the current performance of JavaScript engines seems almost magical to me.) PyPy is the "JIT runtime" option there; and you can easily create micro-benchmarks where it beats the pants off the reference (written in C with fairly dumb techniques) Python implementation, but on average the improvement is... well, still pretty good ("On average, PyPy is about 3 times faster than CPython 3.11. We currently support python 3.11 and 2.7"), but shrinking over time, and it's definitely not going to put you in the performance realm of native-compiled languages.

The problem is there's really just too much that can be changed at runtime. If you look at the differences between Python and its competitors like Mojo, and the subsets and variants of Python used for things like Shedskin and Cython (and RPython, used internally for PyPy) you quickly get a sense of it.


What happened to Layer 3? Lol

Wild speculation: Could the extra speedup be due to some kind of JIT hotpath optimisation that the previous reflective non-inlinable call prevented, and which the new use of the single `arrayLength` bytecode enabled? E.g. in production maybe you're seeing the hotpath hit a JIT threshold for more aggressive inlinng of the parent function, or loop unrolling, or similar, which might not be triggered in your test environment (and which is impossible when inlining is prevented)?

Author of the blog post here. That explanation sounds very plausible to me!

If the whole enclosing function became inlinable after the reflective call path disappeared, that could explain why the end-to-end speedup under load was even larger than the isolated microbench.

I admit that I don't understand the JIT optimization deeply enough to say that confidently... as I mentioned in the blog post, I was quite flummoxed by the results. I’d genuinely love to learn more.


I can't tell what message the blog post is trying to convey. It doesn't read like a particularly open-source-friendly approach.

Maybe I'm getting the wrong impression from how the whole thing is framed. This sentence from the opening sums up the tone to me:

> For many years we had to rely on our own internally developed fork of FFmpeg to provide features that have only recently been added to FFmpeg

Like, boohoo for Meta? Could they not have upstreamed those features in the first place? They didn't integrate with upstream and now they're trying to spin this whole thing as a positive "bringing benefits to Meta, the wider industry, and people who use our products"? C'mon.

My take-away from it is not what the article actually says, but what it seems they should've done from Day 0: "upstream early; upstream often".


David May and his various PhD students over the years have retried this pitch repeatedly. And Graphcore had a related architecture. Unfortunately, while it’s great in theory, in practice the performance overall is miles off existing systems running existing code. There is no commercially feasible way that we’ve yet found to build a software ecosystem where all-new code has to be written just for this special theoretically-better processor. As a result, the business proposal dies before it even gets off the ground.

(I was one of David’s students; and I’ve founded/run a processor design startup raised £4m in 2023 and went bust last year based on a different idea with a much stronger software story.)


Yes David is the man and afaict has made a decent fist of Xmos (from afar). My current wild-assed hope for this to come to some kind of fruition would be on NVidia realising this opportunity (threat?), making a set of CUDA libraries and the CUDA boys going to town with Occam-like abstractions at the system level and just their regular AI workloads as the application. No doubt he has tried to pitch this to Jensen and Keller.

How closely is BEAM/OTP related to the foundational work on CSP (and the implementation in Occam/Transputer way back when…)?

Good question! It's a bit of a stretch. BEAM has mailboxes, non-blocking sends, and asynchronous handling of messages, whereas the original CSP is based on blocking sends and symmetric channels. Symmetric means you have no real difference between sends and receives: two processes synchrnoise when they are willing to send the same data on the same channel. (A "receive" is just a nondeterministic action where you are willing to send anything on a channel).

Occam added types to channels and distinguished sends/receives, which is the design also inherited by Go.

In principle you can emulate a mailbox/message queue in CSP by a sequence of processes, one per queue slot, but accounting for BEAM's weak-ish ordering guarantees might be complicated (I suppose you should allow queue slots to swap messages under specific conditions).


My understanding is that BEAM/OTP is not related to CSP, but to the Actor model (although IIRC Hewitt disagreed).

The UK and Europe welcome the US Footgun Operation. Plenty of opportunities for those top researchers and engineers over here.

The EU (which is not the same as Europe), is also looking a bit sharper on AI regulation at the moment (for now… not perfect but sharper etc etc).


The EU and UK is a long way from attracting top AI talent purely from opportunity and monetary terms.

Not to mention UK is arguably further down the mass surveillance pipeline than the US. They’ve always had more aggressive domestic intelligence surveillance laws which was made clear during the Snowden years, they’ve had flock style cameras forever, and they have an anti encryption law pitched seemingly yearly.

I’d imagine most top engineers would rather try to push back on the US executive branch overreach than move. At least for the time being.


For sure we’re not currently attracting the talent. There’s more to that than just money, but money is significant factor. When it comes to compensation, AI is too broad a category to have a meaningful debate. Hardware or software or mathematics or what kind of person? Etc.

I’m not gonna dispute the UK being further down some parts of the road.

Not sure what you’d count as top engineers, but I know enough that have been asking about and moving to the UK/EU that it’s been a noticeable reversal of the historic trends. Also, a major slowdown of these kinds of people in the UK/EU wanting to move to the US.


The EU and UK is a long way from attracting top AI talent purely from opportunity and monetary terms.

Which is why people are talking about this -- it's about ideology now.

You may personally be motivated solely by money. Not everybody is you.


I’m not an AI engineer but it’s not hard to imagine why some bright talent would want to work at the most exciting AI companies in the US while also making 3-10x what they’d make in Europe.

Ideology is easy to throw around for internet comments but working on the cutting edge stuff next to the brightest minds in the space will always be a major personal draw. Just look at the Manhattan project, I doubt the primary draw for all of those academics was getting to work on a bomb. It was the science, huge funding, and interpersonal company.


See my other comments around here. This idea that salaries in the US are so much higher than Europe for all these top AI roles just isn’t true. Even the big American companies have been opening offices in places like London to hire the top talent at high salaries.

This also isn’t hypothetical. I know top-talent engineers and researchers that have moved out of the USA in the last 12 months due to the political climate (which goes beyond just the AI topics).

And you might want to read a few books on the Manhattan project and the people involved before you use that analogy. I don’t think it’s particularly strong.


> I know top-talent engineers and researchers that have moved out of the USA in the last 12 months due to the political climate

Are they working remotely for US companies? In Canada that’s very much still the case everywhere you look

> Even the big American companies have been opening offices in places like London to hire the top talent at high salaries.

I assumed this discussion was about rejecting working for US companies who would be susceptible to the executive branch’s bullying, not whether you can you make a US tier salary off American companies while not living in America. If you’re doing that you might as well live in America among among the other talent and maximize your opportunities.


No, it’s a counterpoint on salaries… “Even the American companies” ie they wouldn’t have to open offices here, nor would they have to pay high salaries, to compete for talent if everyone they wanted was in the US or could be so easily attracted to move to the US. The point is clearly things aren’t so one-sided as people seem to think.

Google's Deepmind is UK based.

It is American owned now but it clearly hired enough talent for Google to buy it.


Exactly. Attracting talent is not the same as having talent.

https://worldpopulationreview.com/country-rankings/education...

You attract talent for the same reasons china attracts sales; at the cost of your very own rights.

Look at the towns suffering around data centres for a start. The rest of us are happy to pay for what you'll do to yourselves.


Do UK and Europe have hardware manufacturing for those researches to work with once US imposes GPU export restrictions to them at the first whiff of competition/threat?

Yes.

And the US can’t realistically stop our well-funded homegrown AI Hardware startups from manufacturing with TSMC. This is part of why there’s funding from the EU to develop Sovereign AI capabilities, currently focused on designing our own hardware. We’re nothing like as far behind as you might expect in terms of tech, just in terms of scale.

Also, while US export restrictions might make things awkward for a short while, it wouldn’t stop European innovation. The chips still flow, our own hardware companies would scale faster due to demand increase, and there’s the adage about adversity being the parent of all innovation (or however it goes).


> And the US can’t realistically stop our well-funded homegrown AI Hardware startups from manufacturing with TSMC

See what happened to Russian Baikal production on TSMC


You mean because of the international sanctions that needed Taiwanese, British and Dutch support to be effective?

Or because of the revoked processor design licenses from the British company Arm (which is still UK headquartered… despite being NASDAQ listed and largely owned by Japanese firm SoftBank)?

Or perhaps you think the US could stop us using the 12nm fabs being built by TSMC on European soil? Or could stop us manufacturing RISC-V-based chips (Swiss-headquartered technology)?

The US is weak in digital-logic silicon fabrication and it knows it. That’s why it’s been so panicked about Intel and been trying to get TSMC to build fabs on US soil. They’re pouring tens of billions of dollars into trying to claw back ownership and control of it, but it’s not like Europe or China or others are standing still on it either.


> Or perhaps you think the US could stop us using the 12nm fabs being built by TSMC on European soil?

Being built as in not operating yet?

12 nm gpu is what? Nvidia 1080/2060 level? Those top researchers mentioned would love to train on that. Also how many gpus would be made annually?

Also what about CPU? You gonna use risc-v? With what toolchain?

Chinese could pull it off in a few years, yeah.

EU? Nah. Started thinking about sovereignty too late compared to China


Things can change quickly. Give it a decade.

Nvidia uses RISC-V as the main controller cores in its GPUs. They’re also exploring replacing their Arm CPU with RISC-V I hear.

Meta recently bought Rivos in a huge show of confidence for RISC-V across processor types for server class.

As for fabrication, the poster above has a lot to learn about both the US’ current weak at-home capabilities (and everything they’re building relies on European suppliers for all the key technology and machines etc.) and about the scaling properties of sub-14nm nodes. Any export controls or sanctions to prevent Europe using American-designed Taiwan-manufactured chips would result in American being cutoff from everything they need to build fabs on US soil. It would backfire massively.

Lastly, the UK and EU already have cutting edge AI Inference chips, and the ones for training are coming this year. Full stack integration (server box, racks, etc) is also being developed this year. We’re not a decade away from doing this - we’re 18 months away. Deployment at scale will take longer - not having Nvidia as competition would be a huge boon for that haha!


The GPUs and AIUs aren't being manufactured in the US.

The EUV and other factory equipment everyone's using is predominantly European. High-end testing tools used in R&D are largely European.

The fabs aren't, and that is no small thing. The tech stack is there though.

It's pretty tiresome that the HN audience keeps assuming Europe doesn't have "tech" because it doesn't have Facebook. Where do you think all the wealth comes from? Europe is all over everyone's R&D and supply chain.


I sometimes wonder whether people realise which country ASML is based in, and which country their major suppliers are in (e.g. optics: Germany)

To make 1/10th the salary they're making now?

You seem to have a very ill-informed view of UK/EU salaries in this particular sector; And also: yeah, people take salary hits to go do things they believe in (this is like, the entire premise of the underpaid American startup founder model) - it should come as no surprise that people are willing to forgo pay for reasons other than just building their own business / making themselves personally wealthy.

We're talking about the "brightest scientist and engineers" in the AI sector, you may be underestimating US salaries for the people that's referring to.

And no, working remotely for US companies doesn't count.


> To make 1/10th the salary they're making now?

Yeah, and also be slapped with some unrealized capital gains tax on assets they acquired while working in the US...


First, the difference isn’t that big in the economically stronger EU countries. Second, you need to factor in cost of living, which by most accounts is lower. Third, meaningful labor laws and a shared appreciation for work-life balance. And finally, to continue the sweeping generalizations, while we celebrate business acumen, we don’t fetishize wealth. People who flaunt money get made fun of, as do sigma grindset hustle bros.

I’ll take a pay cut any day for the ethos of the EU.


> First, the difference isn’t that big in the economically stronger EU countries

It's exactly that big. It's not as big for people with low qualifications, but the more highly qualified the specialist, the greater the difference.

> Second, you need to factor in cost of living, which by most accounts is lower.

But here the difference really isn't that big.

> Third, meaningful labor laws and a shared appreciation for work-life balance.

This works more against EU rather than for them. Peak tech skills aren't usually acquired through laziness around and following meaningful labor laws, even in the EU.

> while we celebrate business acumen, we don’t fetishize wealth

An excuse for poor people (who still fetishize wealth)


That much?

No, of course not.

For the "brightest scientist and engineers" in the AI sector? I wouldn't be so sure.

I agree. And even if those workers stay in the U.S., there’s absolutely no guarantee that they’ll do their best to favor the government’s interests — quite the opposite, if anything.

At the end of the day it’s a matter of incentives, and good knowledge work can’t simply be forced out of people that are unwilling to cooperate.


Circular-breathing causes the air to heat up, causing expansion. This is how a balloon can expand even when someone is breathing air from inside it.

s/breathing/investment/g s/balloon/bubble/g s/air/money/g


I performed the suggested substitution. What is the heating up of money in that analogy?

Sarcastically, it's "the vibes intensifying".

(Vibes ~ Vibrations ~ Heat)

Tbf it's a reasonable question... I think it's a little tricky to pin down the equivalent of "kinetic energy" in purely economic terms, though you might look at the rate of flow of money as some analogy for the speed/energy of particles (speed of individual dollars changing hands). In that sense, the more frequent and larger these deals get, the hotter the market is. This is not a novel analogy.


You may also be interested in: https://go.dev/blog/greenteagc


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: