I 10x'd a TI-84 emulator's speed by replacing a switch-case

KMnO4 · on Aug 8, 2022

This is one of those gotcha that trap senior devs way more than junior devs. Growing up with C being my first language, I intuitively associate switch with a jmp table (ie extremely fast). Even though I knew it’s unlikely to be as performant in JS, it never occurred to me that it would actually be slow, until I did a deep dive into low level JS optimization.

Related: Matt Godbolt’s talk on emulating 6502 in JS. He also mentions the poor performance of switch for opcodes: https://youtu.be/7WuRq-Wmw5o

fenomas · on Aug 8, 2022

> the poor performance of switch for opcodes

I think you've slightly misunderstood TFA. The issue there wasn't that the switch statement was slow, per se - it was that V8's optimizing compiler didn't yet support such statements, so V8 was bailing out of its JIT step and executing the function via a (dog-slow) interpreter. In other words the function would have run painfully slow even if it returned before reaching the switch statement.

It's also maybe worth adding that gotchas like this were super common way back in the early days TFA describes, but AFAIK they are no longer a concern now. Back when V8's optimizer was primitive it used to bail out for all kinds of reasons - because it saw a "try()" statement, because a function was too long, or even just because it got confused. Nowadays it's a different world - I do a lot of JS perf work, but it's probably been two years since I've seen V8 deopt anything for any reason.

ncmncm · on Aug 8, 2022

I think you also misread the article. It clearly said that an optimized switch would still run as a series of "if" tests, and that when he got the compiler to generate that code, it wasn't faster.

fenomas · on Aug 8, 2022

TFA doesn't cover any cases where switch statements got optimized. It starts with the relevant code getting deopted, then the author tries a fix but the code is still deopted, then he removes the switch statements.

(Mind you I'm not saying switch statements are fast - they may well be slower than the alternatives even after being optimized. I'm just pointing out that TFA isn't about the performance of switch statements, it's about avoiding deopts.)

whizzter · on Aug 8, 2022

The EcmaScript switch spec basically says that each case statement in turn is evaluated until one becomes equal or default is encountered, so to follow the spec the basic case is equal to a bunch of if-else's.

To optimize the runtime first has to figure out that all case values are constants (and insert guards to de-opt if it doesn't hold anymore), so in essence the machinery for running this code optimized has to be very conservative (and probably why they placed arbitrary limits that really aren't suited for emulators).

https://tc39.es/ecma262/multipage/ecmascript-language-statem...

fps_doug · on Aug 8, 2022

TFA references a post that states that optimized switch statements in V8 are still just if-elseif-else and not jump tables, at least back when it was written. They might be faster than unoptimized JS, but certainly not as fast as jump tables. The author didn't even bother to split up the functions/switch blocks further when they learned about the requirements to get V8 to actually optimize a switch, as it seemed not worth it.

fenomas · on Aug 8, 2022

Sure, none of that is in doubt. But in my experience the performance cost of hot code getting deopted is so massive as to overwhelm virtually any other factor, so realistically, fixing the deopt is what solved TFA's problem. Of course getting rid of the switch statements was presumably an additional speedup, but since TFA doesn't compare those cases I guess that's neither here nor there.

whizzter · on Aug 8, 2022

The if-else behaviour comes from the EcmaScript spec, check the sibling comment.

hinkley · on Aug 8, 2022

> because a function was too long,

Famously, function inlining decisions were based on the size of the function, not the size of the AST of the function, which lead to situations where removing inline comments allowed a function to be faster.

madacol · on Aug 8, 2022

What is TFA?

cleerline · on Aug 8, 2022

The Freaking Article

dotancohen · on Aug 8, 2022

The _Fine_ Article.

ruuda · on Aug 8, 2022

V8 did not have an interpreter in 2013.

nickelpro · on Aug 8, 2022

Of course it did, how do you think a JS engine runs code that it doesn't have any type information about without an interpreter?

dchest · on Aug 8, 2022

The bytecode interpreter appeared in 2016 https://v8.dev/blog/ignition-interpreter Before that, it compiled everything into machine code.

Dylan16807 · on Aug 8, 2022

With a bunch of dynamic dispatch.

Interpreters and basic compilers are remarkably similar, and it's easy to transform between them if you do some extra busy work. Once you have your final AST, instead of outputting bytecode instructions, output the implementation of each instruction. Or vice versa.

You have code that handles different types in the interpreter? Great, bake it into the compiled code.

The transform from bytecode interpreter to compiler is always pretty simple, but the other way around gets more difficult the smarter the compiler is. So yes it is very possible to have a baseline javascript compiler and no interpreter.

nickelpro · on Aug 8, 2022

I guess I'm differing on this point. By "baking in" the dynamic dispatch the final compiled code is, in my mind, an interpreter. Which is how I always thought of the "Full Codegen" step of the old v8 architecture.

ruuda · on Aug 8, 2022

I guess the lines can be a bit blurry especially when the compiled code still relies heavily on dynamic dispatch, but V8 generated machine code for the target CPU, marked the page as executable, and then it would call that just generated code. I think most people would call that a JIT-compiler.

It wasn’t a very advanced compiler in the sense that it generated very inefficient machine code. But it still ran circles around a tree-walking or bytecode interpreter. Over time the compiler became more advanced and multi-tiered, but at the cost of startup time, and mobile was taking off, where memory was more scarce than on desktop. x86 or ARM code is a lot bigger than a specialized bytecode that is optimized for space efficiency. So eventually the Ignition interpreter was added. But unlike some of the other Javascript engines that started as an interpreter and later added native code generation, V8 started out generating native code and only later added a bytecode interpreter.

Dylan16807 · on Aug 8, 2022

Well let's say you wrote C++ where most of your data is in a super union of float, string,*, bool, int, map*, etc, with gobs of repetitive dynamic dispatch whenever you touch one of those unions. That wouldn't mean your code is being interpreted, would it?

ReadTheLicense · on Aug 8, 2022

Could you please explain what do you mean by type information? I thought you never have type information in JavaScript... - except the base types (bool, number, string, object, array, function), but you always have this information...

fenomas · on Aug 8, 2022

Basically, modern JS engines are fast because they do a lot of inferring about types. They do this both statically (by inspecting the code at compile time) and dynamically (by watching the code as it executes), and they do it for objects as well as literals.

For a hand-waving theoretical example, if you have code like:

    var a = { foo:1 }
    var b = { foo:2 }
    var c = { foo:3 }
    doSomething(a)
    doSomething(b)
    doSomething(c)

then modern JS engines can see that the "doSomething" function always gets called with arguments of the same type signature, and if it's a hot function they might optimize the function around that information.

If you want to know more, searching on "V8 hidden classes" should turn up relevant articles.

dehrmann · on Aug 8, 2022

In my first job out of school, I was writing driver code in C. In one of my projects, I had a code block that wasn't working, but all it did was a switch on ints. On a hunch, I tried ifs, and it worked. Turns out I found a bug in the in-house GCC fork. I'm sure my manager thought I was an idiot--I was a new grad saying my project isn't working, and I think it's a compiler bug.

carlmr · on Aug 8, 2022

We always assume compilers to be infallible, and for the most part this is right. But special compilers for special targets or certified compilers for special uses, I've found to often have compiler issues. Why did you have a GCC-fork?

dehrmann · on Aug 8, 2022

I wish I could remember. It wasn't a secret sauce thing, so all I can think of is you get improved support and faster bugfixes.

georgecmu · on Aug 8, 2022

What processor was that for? I remember (1999-2000) we found out the hard way that GCC for Hitachi H8s had that exact bug.

dehrmann · on Aug 8, 2022

I think it was a generic mips CPU.

throwawaysleep · on Aug 8, 2022

Did you stick to it? I admittedly would have conceded I was wrong in that situation.

dehrmann · on Aug 8, 2022

I did! The compiler team agreed that it was broken and gave me a flag to disable jump tables. Surprisingly, that didn't cause any noticeable performance regressions.

teruakohatu · on Aug 8, 2022

> would actually be slow, until I did a deep dive into low level JS optimization

Can you explain why this is?

Intuitively it would seem like optimizing a switch statement would be really low-hanging fruit. I would have thought it would be optimized into something almost exactly like what the author hand rolled.

TAForObvReasons · on Aug 8, 2022

You can switch on arbitrary types in JS and you can add expressions:

    switch(3) { 
      case true: break;
      case "ab" + "c": break;
      case Math.random() >= 0.5 ? "heads" : new Error("tails") : break;
    }

The optimizer will have to be persuaded that the switch and each case is integral before optimizing with a jump table. On the other hand, Arrays typically have a fast path for integer indices.

teruakohatu · on Aug 8, 2022

That explains it, but shouldn't a switch with string or numerical constants (or an expression made up of constants) just be optimized away anyway?

jcfields · on Aug 8, 2022

The article quotes a Stack Overflow post with the conditions under which V8 will optimize the switch statement:

> @LGB actually in V8 (JS engine used by google chrome) you need to jump through a lot of hoops to get switch case optimized: All the cases must be of same type. All the cases must either be string literals or 31-bit signed integer literals. And there must be less than 128 cases. And even after all those hoops, all you get is what you would have gotten with if-elses anyway (I.E. no jump tables or sth like that). True story.

In the case of this emulator, it wasn’t being optimized because it had more than 128 cases.

teruakohatu · on Aug 8, 2022

> In the case of this emulator, it wasn’t being optimized because it had more than 128 cases.

Now that makes sense. Thank you.

shultays · on Aug 8, 2022

But in js you can have maps of arbitrary types as keys right? So jump table could just use that

mattashii · on Aug 8, 2022

that's not quite true, because arbitrary expressions can change meaning due to e.g. overloading of functions.

E.g. Random.random() will have to be called each time that case branch is evaluated, because that function could be overloaded or change its returned value.

syntheweave · on Aug 8, 2022

I don't know. Maybe it's because I've spent too long thinking about such things, but interpreter dispatch routines are one of those areas of code where micro-optimizations are very pernicious and one should expect to review the landscape each time to rediscover best practice. See all the ways of compiling and executing Forth words for examples.

phkahler · on Aug 8, 2022

>> I intuitively associate switch with a jmp table

Me too, but I try to keep the set of values dense and use a mask (op & 0xff) to help the compiler know it doesn't need to do bounds checking. Verify these changes help of course.

ChrisRR · on Aug 8, 2022

I'm the same. I've always worked in C and would've naturally assumed it would've executed as a jump table.

greggman3 · on Aug 8, 2022

I like function tables. I think they are cleaner than switch statements

But, (maybe this is an invalid test), checking today my results suggest a switch is faster than a function table in 2022? At least at a base level. Maybe the more complicated the code for each case gets the less likely it is to get optimized.... which is true in general? large code blocks are less likely to get optimized and a switch is considered one large code block.

https://jsbenchit.org/?src=5c1e5bcf62448b6d1750b12893b0f0a1

masklinn · on Aug 8, 2022

The article is a decade old.

In the meantime the entire v8 execution pipeline has been rewritten: a new optimising compiler was swapped in, a baseline interpreter was added, and the baseline compiler was removed (in more or less that order iirc).

Wouldn’t at all be surprising if that lack of optimisation had been long fixed.

vanderZwan · on Aug 8, 2022

Nitpicking on an otherwise insightful comment: the article is recent, but the period in which the optimization was made was decade ago.

whent · on Aug 8, 2022

Are function tables really all that cleaner? Honest question. If I can offer a counterpoint a function table requires us to code some function dispatch routine that looks at the intention (keywords in function table), then looks up the function, then finds a way to bind the arguments to the function (language specific) and finally call the function? Isn't this more complexity in code?

Isn't switch-case simple because it is available as a construct in the language?

no_time · on Aug 8, 2022

Staggering difference between Spidermonkey and JSCore. I hope the situation isn't as dire on linux :/

mysterydip · on Aug 8, 2022

From the title, I expected to find one of the frequently called case statements was unoptimized and they fixed it. Instead, the fix was rewriting all case statements in a way the javascript engine would optimize. Very frustrating if you're unaware of the obscure rules around it.

yarg · on Aug 8, 2022

Not quite - the javascript engine failed to optimise reasonable situations, so the dude translated to a code structure (arrays of functions) that didn't require optimisation.

Essentially he was forced to manually code what should've been done under the hood anyway.

xena · on Aug 8, 2022

she* the author is female.

yarg · on Aug 9, 2022

If the sex of the author had any bearing on my interpretation of the article, I might actually give a damn.

But it doesn't; so I don't.

shadowofneptune · on Aug 8, 2022

I'm unsure. They're different enough in structure that it feels to me that it should be a conscious decision. In a switch statememt, there's usually one large scope, while in the function table there's a lot of little scopes. The pathway through the code during execution is different ad well. Massive switch statements are also not reasonable situations from the perspective of a compiler, as they see little use outside of VMs.

yarg · on Aug 8, 2022

> They're different enough in structure...

Sure, I mean more in terms of producing a compiled form with O(1) rather than O(n) performance.

> Massive switch statements are also not reasonable situations from the perspective of a compiler, as they see little use outside of VMs.

I'm not really following this logic, massive switch statements are those that can benefit most significantly from the generation of jump tables.

And what needs to be done to generate them doesn't differ significantly based upon the branch count (when you hit case 0x100 you'll need to switch to 16-bit, so binary size will be impacted).

The fact that it's primary use case is reasonably niche is fairly irrelevant given there's bugger all compiler overhead required for support.

Now from a code perspective, I would say that a statement that large probably isn't ideal (depending on the size of your case blocks).

shadowofneptune · on Aug 8, 2022

> Now from a code perspective, I would say that a statement that large probably isn't ideal (depending on the size of your case blocks).

This is what I had meant, yes. Optimizing such large blocks is quite difficult. A case label in C and C-like languages does not even need a block, making it harder.

> The fact that it's primary use case is reasonably niche is fairly irrelevant given there's bugger all compiler overhead required for support.

Even static C compilers still struggle with optimizing the switch-in-for-loop pattern, let alone a JIT compiler. Clang does the best job nowadays, but it still lags behind other methods like computed goto or continuation-passing. The Python VM will switch to computed goto if it's available on the C compiler.

Best primer on the subject I have found, even if old: https://www.complang.tuwien.ac.at/forth/threaded-code.html

My own benchmarks from last year (note that these toy VMs may not be applicable to larger ones): https://github.com/shadowofneptune/threaded-code-benchmark

It does take work to make a faster VM, it feels unreasonable to expect a compiler to do a good job at optimizing it without effort on the writer's side.

andreareina · on Aug 8, 2022

VMs aren't exactly rare though, and are a case (heh) of when you really want all the optimization the compiler can give.

naniwaduni · on Aug 8, 2022

In the grand scheme of things, VMs and interpreters are a class of program that tend toward perhaps surprisingly pathological behaviour, among programs people are likely to write.

yarg · on Aug 10, 2022

O(1 for small n; n otherwise) just seems like a poor decision.

benreesman · on Aug 8, 2022

This kind of stuff is usually at least somewhat frustrating in the moment, but it’s just the nature of things: we’re always mapping some task onto some machine and it wouldn’t be such a great story if it wasn’t such an elegant hack. This one is particularly interesting because it’s one VM mapping onto a second VM and the impedance mismatch is very meta.

My only nitpick with a great read is that I want to know a little more gory details of how the automation was done, it could easily have involved a third or even fourth VM!

eyelidlessness · on Aug 8, 2022

But knowing how the JIT deoptimizes is a good thing for anyone programming perf-sensitive code for that JIT. Even developing a sense of where branchy logic might deopt is a good thing, even if your code might not be perf-sensitive but could be called in a perf-sensitive call stack, because it’ll give you a good heuristic for when optimization is premature. Optimizing for the JIT isn’t and shouldn’t be a guiding principle for every dev, but it’s harmless or good for us to be aware.

saagarjha · on Aug 8, 2022

> Optimizing for the JIT isn’t and shouldn’t be a guiding principle for every dev, but it’s harmless or good for us to be aware.

No, it can be quite harmful: you could end up overoptimizing for one engine and ruin performance on another engine, or even a different version of the same engine. When making non-trivial optimizations that are designed around the quirks of the optimizing compiler you should be very careful about the tradeoffs and also what the penalty will be if the optimization breaks in the future.

pjmlp · on Aug 8, 2022

Just like it happens in C when driven by assembly output on a specific compiler version.

saagarjha · on Aug 8, 2022

Yep, exactly. There are very few codebases that can actually write code directly to the optimizer and they are usually developed very closely with a specific compiler. For other cases usually you can trust that a number of basic optimizations can be performed and then be cautious about what the various compilers can do. As you gain more experience you can slowly expand the set of things you would expect to be "easy" for the compiler–when those passes don't run, they're generally considered to be bugs.

pjmlp · on Aug 8, 2022

It is no different from compiled languages, that is one of the reasons why most compilers regardless of the language, allow to look into the generated Assembly.

mysterydip · on Aug 8, 2022

The difference is control. You can choose which compiler to use for your project, even an outdated version if it works for your use case. But you have no control over what JIT engine visits your site.

pjmlp · on Aug 8, 2022

Partially true, most of the time many dev teams don't control what is on the CI/CD servers, or what compilers are being used when shipping libraries.

IshKebab · on Aug 8, 2022

It's not fundamentally different but it is generally a lot easier to predict what will be fast with e.g. Clang than V8.

anonymoushn · on Aug 8, 2022

For inner loops of interpreters in low-level languages, unless you are using some labels-are-values extension, it's often tough to convince the thing to generate the right code.

pjmlp · on Aug 8, 2022

Only if one has a PhD in ISO C semantics and compiler optimization handling of UB.

What is fast in clang might be totally crap on xlC, or even plain clang vNext, for example.

marwatk · on Aug 8, 2022

For anyone looking for an open source web-based TI emulator I just ported wxWabbitEmu (which I believe supports the TI-84) [1]. I only skinned the TI-85 since it's the calc I use, but the skins are just CSS.

I'm curious what the performance is like. I run it on my phone as a basic calculator, haven't messed with running binaries at all.

[1] https://github.com/marwatk/wasmwabbitemu

thrtythreeforty · on Aug 8, 2022

This is really cool! It's a shame that jsTIfied isn't open source; I would have liked to use it for live demos in my roundup [1] of cool TI software projects. Your project with a skin would be perfect.

[1]: https://www.thirtythreeforty.net/posts/2021/10/ti-calculator...

siddontang · on Aug 8, 2022

Using map or array instead of switch-case is a common optimization way, but please do a benchmark if you have no confidence whether you need do or not.

E.g, it doesn't already work in Go, https://stackoverflow.com/questions/46789259/map-vs-switch-p...

But if there are too many switch-cases, the codes may be ugly to read, so maybe using a map or array is a better choice here.

firloop · on Aug 8, 2022

Wonder if that benchmark still holds true for Golang. Go 1.19 (released last week) includes jump table optimizations for switch statements.

https://go.dev/doc/go1.19#compiler

jheuel · on Aug 8, 2022

I just tested it:

1.18:

    BenchmarkMap-10         13388006                77.38 ns/op
    BenchmarkSwitch-10      50482033                23.00 ns/op
    BenchmarkSlice-10       100000000               10.90 ns/op

1.19:

    BenchmarkMap-10         14947908                76.78 ns/op
    BenchmarkSwitch-10      49444435                23.01 ns/op
    BenchmarkSlice-10       100000000               10.74 ns/op

edit: If I understand the release notes correctly the switch optimisations are only new for large integers and strings.

summerlight · on Aug 8, 2022

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

Somewhat tangential, but one of the most informative thread on interpreter performance optimization.

james-redwood · on Aug 8, 2022

I have this bookmarked. It's a great thread.

klelatti · on Aug 8, 2022

Shameless plug for my 6502 JS emulator / assembler in an Observable notebook, which takes the opcode, converts into pseudo assembly and then uses a switch statement on the mnemonic. I expected it would be horribly slow but maybe this switch on the (50 something instructions) is OK?

https://observablehq.com/@tech30k/6502

Written as a convenient way to try to explore the ISA and hoping to try the same approach with other early ISAs soon (6800 in progress).

(Please be kind with the code. I'm not a JS programmer - its my first JS beyond a handful of lines - and there are lots of issues!)

dado3212 · on Aug 8, 2022

> I was used to dealing with obfuscated code from Minecraft

I wonder how common this is for engineers. I’ve been in industry for the last 4-5 years or so, and it was how I got a pretty big jump into programming. Talked to a few similarly aged engineers at FAANG and a couple others got into programming through Minecraft as well.

saagarjha · on Aug 8, 2022

I know several people who are experts at reverse engineering, obfuscation, and runtime patching solely because they did Minecraft mods.

machiaweliczny · on Aug 8, 2022

For me it was bots for RuneScape

girvo · on Aug 8, 2022

Same, and writing bespoke keyloggers for RuneScape GP

sdflhasjd · on Aug 8, 2022

Minecraft is too new for me, but I did get most of my pre-professional experience from modding in video games, other people I've worked with can say the same.

bobsmooth · on Aug 8, 2022

I imagine Roblox will have been a similar inspiration for developers in a few years.

Jaygles · on Aug 8, 2022

My first useful code I wrote was writing scripts to beat Neopets games

difu_disciple · on Aug 8, 2022

Bypassing cheat protections for Maplestory

wiseowise · on Aug 8, 2022

Isn’t Minecraft Java?

a_t48 · on Aug 8, 2022

jraph · on Aug 8, 2022

I understand that switch-cases may not be optimized well in JS, but what surprises me is that all the instructions in the cases are now grouped into functions and this is faster than regular switch case.

When I write Javascript, I expect function calls to be slow. So I see three possibilities:

- (unoptimized) switch statement is very slow in JS, so slow that function calls will be faster

- function calls are not that slow in the end.

- both

It's a shame that the switch statement cannot relied on to be fast though. They are very readable, and indeed, feel like code that can be highly optimized. I guess that would risk making the JIT compilation too slow, or that not enough code found on the web rely on switch statements for this to matter too much.

Is this still true today? (I've seen a comment that says the 128 limit is still in V8's code)

How is this handled in SpiderMonkey, JavascriptCore or other JS engines?

fenomas · on Aug 8, 2022

TFA is basically about a limitation in V8's compiler, circa 2013. Based on the bug report it looks like that limitation was gone a few years later, and either way V8 switched to a whole different compiler back in 2017 or so.

djmips · on Aug 8, 2022

FYI - This article's from 2013 or the work was done in 2013.

justinlloyd · on Aug 8, 2022

Interestingly I wrote a 6502 (actually Atari 6507) emulator and Z80 emulator for the MIPS R3000 in the original SONY PSX, and the only way to get the speed I needed was to use calculated jump tables rather than relying on a switch case. I know that C provides that calculated jump, but even that wasn't enough. Eventually I rewrote the entire emulator in R3000. And then I eventually switched to using CPU retargeting/recompilation from 6502 to R3000, and then used the same recompilation trick to get 6502 emulation running at a decent speed on the ARM CPU in the Nintendo Gameboy Advance.

flohofwoe · on Aug 8, 2022

That depends a lot on the compiler, I guess in the 90's compilers didn't optimize switch-case that well yet.

In modern compilers a switch-case may end up faster than a function-pointer jump table, because even though the compiler will turn the switch-case into a jump table as well, the code that's called through the jump table doesn't have the prologue/epilogue overhead of regular function call. In my emulators I also haven't seen a difference between a regular switch case, and 'computed goto' (but depending on how the decoder loop looks like, computed goto may still provide have an advantage, just not in my CPU emulators).

justinlloyd · on Aug 19, 2022

To add to this I didn't do a function-pointer jump table. A function pointer jump table was too slow and involved multiple look-ups. I took the opcode of the 6502 or Z80 instruction, shifted it, then added that value to the PC (program counter) register. The new value in the program counter was effectively the goto. All the assembly functions that handled each opcode were aligned on a multiple of powers of two value, 2048 bytes IIRC. Though it might have been 1024 bytes, this is over 25 years ago. The instruction immediately after the PC write set the the return address for the function about to be called, the return address write worked due to the pipelining in the MIPS CPU. The return address was the instruction immediately after the PC write. The opcode function executes, and then returns back to the write instruction, which then falls through to the no-op (opcode zero), which handled the gathering of the next opcode to be processed, any updates to the display and hardware, cleaned up the stack, and then looped to process the next opcode.

justinlloyd · on Aug 8, 2022

The old compilers optimized the switch/case, just not as well as something from 2022, and there were lots of edge cases too. And limited registers, and small caches and so forth. Plus, these were CPUs that were at the boundary of what could be done in terms of emulation of another CPU and everything else about the device being emulated too. Today we have the advantage of far more advanced compiler technology and far more advanced CPU technology that runs crazy fast, and a greater understanding of how to emulate the machines and best practices for creating those emulators. 25 years ago, not so much. The PS2 and the XBOX were the first consoles where you could just take MAME or the 2600 VCS emulator I wrote and compile it to the target console and "it just worked." I ported MAME to the PS2 and ran six different games on a tumbling cube, and then MAME to the XBOX shortly afterwards, and Dolphin emulator on XBOX and PS2 as well - though again, smacking up against what that generation of consoles could realistically emulate.

klelatti · on Aug 8, 2022

> CPU retargeting/recompilation from 6502 to R3000

Is this converting 6502 assembly to R3000 - not recompiling the source?

justinlloyd · on Aug 8, 2022

Ja. Take 2600 ROM dump and convert to R3000.

fabioborellini · on Aug 8, 2022

So the writer is a 10x developer.

amelius · on Aug 8, 2022

No, the compiler writer was a 0.1x developer.

saagarjha · on Aug 8, 2022

The compiler writer is probably a fine developer. You just can't 10x everything.

motoboi · on Aug 8, 2022

The compiler writer used everything known to mankind about how to optimize code you haven't yet seen.

Human beings still suck at code.

teddyh · on Aug 8, 2022

> I had to unminify it, but I was used to dealing with obfuscated code from Minecraft.

The ancient ritual persists.

meifun · on Aug 10, 2022

Can anyone share their resources on how to write programs on TI calculators? What is the language option you chose?

TI has a bunch of info here also: https://education.ti.com/en/products/calculators/graphing-ca...

game-of-throws · on Aug 8, 2022

I wonder why the deopt cutoff is 128 and not a nice round number like 256. It seems like switching on the value of a byte would not be that uncommon.

baobob · on Aug 8, 2022

Don't know the guts of the optimization, but 128 64bit pointers is 1024, which is a round number. Could be to avoid memory or cache blowup, or maybe they're using that 8th bit for something internally

Tehdasi · on Aug 8, 2022

Tangentially related, but I remember reading years ago a piece from a graphics driver developer (nvidia maybe?) that one of the reasons why the opengl driver was slower than their directx driver (I believe) was that the structure of OpenGL was such that you had to put a massive switch in the code which really hurt performance.

flohofwoe · on Aug 8, 2022

Very unlikely that an actual switch statement was the problem (if that actually exists, and wasn't just 'figure of speech'). OpenGL state is a huge collection of tiny knobs and toggles ('switches' if you like) and each state change may in turn influence other state, while D3D (since D3D10 at least) groups the same state into a small number of immutable state objects.

rkagerer · on Aug 8, 2022

2013 was a different time, and JS engines have come a long way since then. I really hope they’ve made this better than it was.

Have they? Do the 128 case limit and signed integer literals constraints still exist?

rkagerer · on Aug 8, 2022

Here's a similar case from 2020: https://stackoverflow.com/a/59570825

With a link to the V8 line of code defining the 128 limit: https://github.com/v8/v8/blob/596d0ce7b7a3900c529025aaa77e5f...

RichieAHB · on Aug 8, 2022

Worth noting that SO comment calls out that the linked line points to the code as it was in 2013 and that there have been changes since then:

> The link into the V8 source code above is pretty old (2013), so I tried to find the modern equivalent. I didn't find a hard limit, but found several heuristics that decide between table lookups and tree (binary search) lookups (ia32, x86). When I plug in my numbers I don't quite get a borderline case where I found it, so I'm not sure this is the actual cause or whether there's another optimization not being triggered elsewhere.

klelatti · on Aug 8, 2022

Interesting thanks. Is there a good reason for a relatively low fixed limit like this?

nolist_policy · on Aug 8, 2022

See also the Nostradamus Distributor https://news.ycombinator.com/item?id=2593095

Eeems · on Aug 8, 2022

I had pointed out this optimization years ago, but I never took the time to actually get it merged into the emulator.

lern_too_spel · on Aug 8, 2022

This article is about a change merged 9 years ago.

Eeems · on Aug 8, 2022

Jeez, has it really been that long?

AlotOfReading · on Aug 8, 2022

I get the same sensation too. It's kind of funny how these little reminders of people and interactions from cemetech and Omni pop up elsewhere in the tech community. I'm still proud of some of the stuff I wrote back in those days.

- Qwerty

marethyu · on Aug 8, 2022

Any good resources for building my own TI-84 emulator?

MaxLeiter · on Aug 8, 2022

https://wikiti.brandonw.net/index.php?title=WikiTI_Home

Cemetech is the largest (and now maybe the only) calculator forum around. There’s also a few IRC channels on EFNet

Dwedit · on Aug 8, 2022

Look up the Z80, the ROM, and the IO ports.

y0ungarmanii · on Aug 9, 2022

Reminds me of when Terry Davis taught me that switch statements are just lookup tables and b/c of that more efficient. RIP to the king!

edf13 · on Aug 8, 2022

Has this been fixed in V8 now?