Spending 8 months when the business value at the end was mostly improved velocity doesn't sound likely to be a good tradeoff, especially if this is done as a big bang effort which either succeeds holistically or fails. You might have better success in the future by finding ways to integrate maintenance improvements incrementally.
> finding ways to integrate maintenance improvements incrementally
To be fair, it's possible bureaucratic process got in the way. If their commit / deployment process didn't allow for any changes to hit the production branch until it was "finished" then there wasn't really an opportunity for them to increment.
That seems likely considering "the new manager, who turfed all those fixes, including the new functionality" suggests other organizational problems. If one month is too long for the new manager, the new manager's goal seems to be to "do things" rather than to "solve problems".
This illustrates the problem with this 'business value at all costs' perspective. Improved velocity is not the only or even primary takeaway from consolidating 3 mostly identical codebases into 1. There are so many benefits that I struggle to think of a decent justification for leaving triplicated codebases in play, based on what GP has described.
This comes into tension with making names as clear as possible while still being short. Frankly, I don't find coming up with short names for things to be easy, let alone good names even when taken to the extreme.
Whether something is too short depends on your context. requests.get might be ambiguous for someone who has never seen the code base before, but will quickly get obviously with a little exposure, so does the code base have dedicated maintainers?
The skill of good naming isn't distributed evenly and OP's names are pretty good, so I'd be happy for them to come along to my code base and rename things as long as it was within other constraints (not consuming too much time, doesn't break api, etc).
I'm not sure about the 4090, but most of the GPUs I use have a warp size of 32, and warp divergence affects only up to those 32 threads. If you have a branch and all threads agree, you only walk down one branch.
My mental model is a bit more like you have collections of warps in a block, and all warps in a block get scheduled onto an SM. Different GPU architectures allow for different numbers of warps to be simultaneously active or inactive, and each warp has its own instruction pointer and can be suspended while waiting for things like memory. I found the picture on pg 22 here really helpful: https://images.nvidia.com/aem-dam/en-zz/Solutions/data-cente...
Note that although there's 4 schedulers, on the A100, they don't dispatch every cycle iirc.
The tensor core accelerates mostly matrix operations and is the big block you can see has 4 per SM. Cuda core refers to the thread per SM, which you can see as FP32 or INT32 units, so there are (32*4) per SM on that diagram.
Like you said, tensor core is similar to a special purpose ALU and is at a lower level of abstraction than something with an instruction pointer.
It is a pretty different way of programming, and that's part of what makes it so fun. Within constraints, you get a really interactive way to design algorithms which are much more parallel than what is feasible to write for a CPU. Debugging is so much better these days with some support for debuggers and printf (at least on CUDA). Maybe the same facilities aren't available for WebGPU?
Conditions and loops are largely fine as long as you avoid warp divergence and such.
Not sure about lsp, but I think if you defined your language in tree sitter, you might be able to define a basic autoformatter generically on tree sitter to accelerate bootstrapping your language.
You could also use an existing language agnostic package manager like nix, guix, or conda to bootstrap your language package manager.
Lsp is something I don't know of a way to make that easy without overly constraining the design space.
I can't relate to the other things, but I dislike the majority of things that constitute going on vacation. I don't like planning the vacation including what to do, where to stay or how to get there. I don't like having the vacation planned and looming over my head preventing me from making long term plans.
However, after all that, there are moments throughout which make up for it and make me glad I didn't just stay at home.
It's a common sentiment that others don't know how computers work. I'm sure you personally are an exception, but I think that most people with this sentiment live in glass houses and don't realize how little they understand about how computers work. Since I don't know how they work, I can't provide comprehensive examples, but I remember being surprised by details around dynamic linking and relocatable binaries, around how vfs maps to more specific implementations, the implementation of hard disk drive firmware, the on-drive caches and the firmware that runs that, the implementation in terms of magnetic polarity for storing bits, and many more layers and details besides that I have yet to discover.
No of course I don’t know everything and there’s lots I used to know but have forgotten.
But I do feel fortunate that I got to see a lot of the modern abstraction come into being so I have a vague idea of what they’re abstracting and why. That helps a lot.
I think that ChatGPT could be a big accelerator for creative activity, but I wouldn't trust any output that I've only partially verified. That limits it to human scale problems in its direct output, but there are many ways that human scale output like code snippets can be useful on computer scale data.
For simple things it's pretty safe. I tried pasting in HTML from the homepage of Hacker News and having it turn that into a list of JSON objects each with the title, submitter, number of upvotes and number of comments.
There's two classes of response: those that are factually "right" or "wrong" -- who was the sixteenth president of the U.S.? And those that are opinion/debatable: "How should I break up with my boyfriend?" People will focus on the facts, and those will be improved (viz: the melding of ChatGPT with Wolfram Alpha) but the opinion answers are going to be more readily acceptable (and harder to optimize?).
I've wondered about the same thing of generating a permutation with a minimum number of samples. I'm not sure how Gray codes helps you here, and the multiplication sounds wrong since many outputs aren't reachable from any input.
The idea is just a closed form solution that gets you from a number in 1...nPermutations to a single specific permutation. I was using Gray codes as an example of a sequence that exhausts all permutations with a closed form that gets you to a specific point in the sequence.
Might not work - as I say I never actually got around to implementing it.