This comes into tension with making names as clear as possible while still being short. Frankly, I don't find coming up with short names for things to be easy, let alone good names even when taken to the extreme.
Whether something is too short depends on your context. requests.get might be ambiguous for someone who has never seen the code base before, but will quickly get obviously with a little exposure, so does the code base have dedicated maintainers?
The skill of good naming isn't distributed evenly and OP's names are pretty good, so I'd be happy for them to come along to my code base and rename things as long as it was within other constraints (not consuming too much time, doesn't break api, etc).
I'm not sure about the 4090, but most of the GPUs I use have a warp size of 32, and warp divergence affects only up to those 32 threads. If you have a branch and all threads agree, you only walk down one branch.
My mental model is a bit more like you have collections of warps in a block, and all warps in a block get scheduled onto an SM. Different GPU architectures allow for different numbers of warps to be simultaneously active or inactive, and each warp has its own instruction pointer and can be suspended while waiting for things like memory. I found the picture on pg 22 here really helpful: https://images.nvidia.com/aem-dam/en-zz/Solutions/data-cente...
Note that although there's 4 schedulers, on the A100, they don't dispatch every cycle iirc.
The tensor core accelerates mostly matrix operations and is the big block you can see has 4 per SM. Cuda core refers to the thread per SM, which you can see as FP32 or INT32 units, so there are (32*4) per SM on that diagram.
Like you said, tensor core is similar to a special purpose ALU and is at a lower level of abstraction than something with an instruction pointer.
It is a pretty different way of programming, and that's part of what makes it so fun. Within constraints, you get a really interactive way to design algorithms which are much more parallel than what is feasible to write for a CPU. Debugging is so much better these days with some support for debuggers and printf (at least on CUDA). Maybe the same facilities aren't available for WebGPU?
Conditions and loops are largely fine as long as you avoid warp divergence and such.
Not sure about lsp, but I think if you defined your language in tree sitter, you might be able to define a basic autoformatter generically on tree sitter to accelerate bootstrapping your language.
You could also use an existing language agnostic package manager like nix, guix, or conda to bootstrap your language package manager.
Lsp is something I don't know of a way to make that easy without overly constraining the design space.
I can't relate to the other things, but I dislike the majority of things that constitute going on vacation. I don't like planning the vacation including what to do, where to stay or how to get there. I don't like having the vacation planned and looming over my head preventing me from making long term plans.
However, after all that, there are moments throughout which make up for it and make me glad I didn't just stay at home.
It's a common sentiment that others don't know how computers work. I'm sure you personally are an exception, but I think that most people with this sentiment live in glass houses and don't realize how little they understand about how computers work. Since I don't know how they work, I can't provide comprehensive examples, but I remember being surprised by details around dynamic linking and relocatable binaries, around how vfs maps to more specific implementations, the implementation of hard disk drive firmware, the on-drive caches and the firmware that runs that, the implementation in terms of magnetic polarity for storing bits, and many more layers and details besides that I have yet to discover.
No of course I don’t know everything and there’s lots I used to know but have forgotten.
But I do feel fortunate that I got to see a lot of the modern abstraction come into being so I have a vague idea of what they’re abstracting and why. That helps a lot.
I think that ChatGPT could be a big accelerator for creative activity, but I wouldn't trust any output that I've only partially verified. That limits it to human scale problems in its direct output, but there are many ways that human scale output like code snippets can be useful on computer scale data.
For simple things it's pretty safe. I tried pasting in HTML from the homepage of Hacker News and having it turn that into a list of JSON objects each with the title, submitter, number of upvotes and number of comments.
There's two classes of response: those that are factually "right" or "wrong" -- who was the sixteenth president of the U.S.? And those that are opinion/debatable: "How should I break up with my boyfriend?" People will focus on the facts, and those will be improved (viz: the melding of ChatGPT with Wolfram Alpha) but the opinion answers are going to be more readily acceptable (and harder to optimize?).
I've wondered about the same thing of generating a permutation with a minimum number of samples. I'm not sure how Gray codes helps you here, and the multiplication sounds wrong since many outputs aren't reachable from any input.
The idea is just a closed form solution that gets you from a number in 1...nPermutations to a single specific permutation. I was using Gray codes as an example of a sequence that exhausts all permutations with a closed form that gets you to a specific point in the sequence.
Might not work - as I say I never actually got around to implementing it.
My motivation is to interact with the content of the website. Form is good only while it improves function. For example, simple css improves my ability to read a website over the default rendering of .txt (often seen in websites about K). However, the new reddit design isn't worth it for me because the improved form decreases the function. With lazy loading, controls that are more difficult to interact with, higher footprint, higher idle cost, etc., it's a less pleasant reading experience.
Beyond the experience of a loaded site, its ability to emphasize the text and to not interfere with the current visual field / search (badly implemented lazy loading for example), size alone impacts experience on bad internet connections. I use HN as a test website to see if the internet works over something like msn because it is so lightweight. I measured around 8KB transferred after caching on the home page. (>3000KB from MSN with an idle transfer of 7KB every couple of seconds.)
Whether something is too short depends on your context. requests.get might be ambiguous for someone who has never seen the code base before, but will quickly get obviously with a little exposure, so does the code base have dedicated maintainers?
The skill of good naming isn't distributed evenly and OP's names are pretty good, so I'd be happy for them to come along to my code base and rename things as long as it was within other constraints (not consuming too much time, doesn't break api, etc).