> You'll find thousands of shows with perfect audio quality and professional-grade cover art that contain absolutely zero intellectual nutritional value
This is why I switched to audio books. Many podcasts with real guests contained too much salad and not enough meat (e.g. a machine learning podcast but they talked about going to conferences).
Contrary to what many think, I believe AI generated content can increase the nutritional value. I've done experiments with turning technical PDFs into podcasts, e.g. summarizing machine learning papers (similar to NotebookLM).
Totally agree. The best thing about AI for me so far has been the audio models that can turn an ebook into an audio book.
I have not only switched from podcasts to audio books but now I am on to all the books I have wanted to listen to but are too obscure to ever hire a voice actor for.
This week I have been listening to a PhD thesis in the car that is on an obscure subject of interest to me. In contrast, even what I use to think of as good in terms of podcasts seem more like junk food now.
I searched for good programming or more broad IT-related podcasts but unfortunately haven't found ones that aren't either straight up ad or thinly veiled ad. I understand that invited guests or podcast producers want to have compensation but end result is of putting and not attractive to me. I'll place software engineering radio as an example - I listened to some episodes but it gave me impression of slop even before word slop was established.
On the other hand I know excellent quality podcasts founded by voluntary Patreon members so I hope issue is I simply haven't found IT ones from that spectrum yet.
I was also looking for IT-related podcasts and had the same impression. What seems to work is when people write interesting books and then go on shows to promote their books by talking about the content.
I had some eye strain and think it is because my eye muscles are overused. A doctor told me the muscles in the eye are flat, like tapes, and that I would not feel a muscle ache. I noticed the strain when I focus on different points quickly. I started to pay attention to how I move my eyes and realized I read a lot of text while scrolling, for example reading X posts on mobile while scanning the text at the same time.
Yesterday I was reminded of “Rapid Serial Visual Presentation” for speed reading, where the words are presented so you do not have to move your eyes. I am currently trying it out with a Chrome extension called SwiftRead. I set the text size so it fits into my fovea area. I used a fovea detector website I saw on HN a while ago: https://www.shadertoy.com/view/4dsXzM
(make the pattern full screen, then you can see the size of your fovea).
I also learned that I can reduce some of the strain by moving my head more toward the things I am looking at on the screen.
At the leafs of the branches I'm comfortable to just generate code (e.g. a popup dialog). But I want to have a good grasp of code that is central of the application.
I wish I had an intuitive understanding of how much I can do with a GPU. E.g. how many points can I move around? A simulation like this would be great for that.
They mention it’s 3x faster when turning collision off. I don’t know what the memory footprint of a block is, but I’d speculate that small round particles (sphere plus radius) are an order of magnitude faster.
Modern GPUs are insanely fast. A higher end consumer GPU like a 5090 can do over 100 teraflops of fp32 computation if your cache is perfectly utilized and memory access isn’t the bottleneck. Normally, memory is the bottleneck, and at a minimum you need to read and write your particles every frame of a sim, which is why the sibling comments are using memory bandwidth to estimate the number of particles per second. I’d guess that if you were only adverting particles without collision, or colliding against only a small number of big objects (like the particles collide against the planet and not each other) then you could move multiple billions of particles per second, which you would might divide by your desired frame rate to see how many particles per frame you can do.
the answer is a big depends. but I can give you some ballpark intuition.
perhaps it's easiest to think about regular image processing because it uses the same hardware. you can think about each pixel as a particle.
a typical 4k (3840 x 2160 at 16:9) image contains about 8 million pixels. a trivial compute shader that just writes 4 bytes per pixel of some trivial value (e.g. the compute shader thread ids) will take you anywhere from roughly speaking 0.05ms - 0.5ms on modern-ish GPUs. this is a wide spread to represent a wide hardware spread. on current high end GPUs you will be very close to the 0.05ms, or maybe even a bit faster.
but real world programs like video games do a whole lot more than just write a trivial value. they read and write a lot more data (usually there are many passes - so it's not done just once - in the end maybe a few hundred bytes per pixel), and usually run many thousands of instructions per pixel. I work on a video game everyone's probably heard about and the one of the main material shaders is too large to fit into my work GPUs instruction cache (of 32kb) to give you an idea how many instructions are in there (not all executed of course - some branching involved).
and you can still easily do this all at 100+ frames per second on high end GPUs.
so you can in principle simulate a lot of particles. of course, the algorithm scaling matters. most of rendering is somewhere in O(n). anything involving physics will probably involve some kind of interaction between objects which immediately implies O(n log n) at the very least but usually more.
For examples like particle simulations, on a single node with a 4090 GPU everything running on GPU without memory transfer to the CPU:
-The main bottleneck is memory usage : available 24GB, Storing the particles 3 position coordinates, + 3 velocity coordinates, 4 bytes by number (float32) = Max 1B particles
-Then GPU memory bandwidth : if everything is on the GPU you get between 1000GB/s of global memory access and 10000GB/s when shared memory caches are hit. The number of memory access is roughly proportional to the number of effective collisions between your particles which is proportional to the number of particles so around 12-30 times ( see optimal sphere packing number of neighbors in 3d, and multiply by your overlap factor). All in all for 1B particles, you can collision them all and move them in 1 to 10s.
If you have to transfer things to the CPU, you become limited by the PCI-express 4.0 bandwidth of 16GB/s. So you can at most move 1B particles to and from the GPU, 0.7 times per second.
Then if you want to store the particle on disk, instead of RAM because your system is bigger, then you can either use a M2 ssd (but you will burn them quickly) which has a theoretical bandwidth of 20GB/s so not a bottleneck, or use a network storage over 100Gb/s (= 12.5GB/s) ethernet, via two interfaces to your parameter server which can be as big as you can afford.
So to summarize so far : 1B particles takes 1 to 10s per iteration per GPU. If you want to do smarter integration schemes like Rk4, you divide by 6. If you need 64 bits precisions you divide by 2. If you only need 16bits precisions you can multiply by 2.
The number of particle you need : Volume of the box / h^3 with h the diameter of the particle = finest details you want to be able to resolve.
If you use an adaptive scheme most of your particles are close to the surface of objects so O( surface of objects / h^2 ) with h=average resolution of the surface of the mesh. But adaptive scheme is 10 times slower.
The precision of the approximation can be bounded by Taylor formula. SPH is typically order 2, but has issues with boundaries, so to represent a sharp boundary the h must be small.
If you want higher order and sharp boundaries, you can do Finite Element Method, instead. But you'll need to tessellate the space with things like Delaunay/Voronoi, and update them as they move.
Might be worth starting with a baseline where there’s no collision, only advection, and assume higher than 1fps just because this gives higher particles per second but still fits in 24GB? I wouldn’t be too surprised if you can advection 100M particles at interactive rates.
The theoretical maximum rate for 1B particle advection (Just doing p[] += v[]dt), is 1000GB/s / 24GB = 42 iteration per second. If you only have 100M you can have 10 times more iteration.
But that's without any rendering, and non interacting particles which are extremely boring unless you like fireworks. (You can add a term like v[] += gdt for free.) And you don't need to store colors for your particles if you can compute the colors from the particle number with a function.
Rasterizing is slower, because each pixel of the image might get touched by multiple particles (which mean concurrent accesses in the GPU to the same memory address which they don't like).
Obtaining the screen coordinates is just a matrix multiply, but rendering the particles in the correct depth order requires multiple pass, atomic operations, or z-sorting. Alternatively you can slice your point clouds, by mixing them up with a peak-shaped weight function around the desired depth value, and use an order independent reduction like sum, but memory accesses are still concurrent.
For the rasterizing, you can also use the space partitioning indices of the particle to render to a part of the screen independently without concurrent access problems. That's called "tile rendering". Each tile render the subset of particles which may fall in it. (There are plenty of literature in the Gaussian Splatting community).
> The theoretical maximum rate for 1B particle advection (Just doing p[] += v[]ddt), is 1000GB/s / 24GB =41.667/s 42 iteration per second.
Just to clarify, the 24GB comes from multiplying 1B particles by 24 bytes? Why 24 bytes? If we used float3 particle positions, the rate would presumably be mem_bandwidth / particle_footprint. If we use a 5090, then the rate would be 1790GB/s / 12B = 146B particles / second (or 146fps of 1B particles).
> non interacting particles which are extremely boring
You assumed particle-particle collision above, which is expensive and might be over-kill. The top comment asked simply about the maximum rate of moving particles. Since interesting things take time & space, the correct accurate answer to that question is likely to be less interesting than trading away some time to get the features you proposed; your first answer is definitely interesting, but didn’t quite answer the question asked, right?
Anyway, I’m talking about other possibilities, for example interaction with a field, or collision against large objects. Those are still physically interesting, and when you have a field or large objects (as long as they’re significantly smaller footprint than the particle data) they can be engineered to have high cache coherency, and thus not count significantly against your bandwidth budget. You can get significantly more interesting than pure advection for a small fraction of the cost of particle-particle collisions.
Yes if you need rendering, that will take time out of your budget, true and good point. Getting into the billions of primitives is where ray tracing can sometimes pay off over raster. The BVH update is a O(N) algorithm that replaces the O(N) raster algorithm, but the BVH update is simpler than the rasterization process you described, and BVH update doesn’t have the scatter problem (write to multiple pixels) that you mentioned, it’s write once. BVH update on clustered triangles can now be done at pretty close to memory bandwidth. Particles aren’t quite as fast yet, AFAIK, but we might get there soon.
Well, to get that intuition, I guess you have to start experimenting. WebGPU is quite easy to get started with the concept. But in general it obviously depends what kind of GPU you have.
The critics claim the noise collapses quantum states so quickly that it's not possible to make full use of the quantum effects. The burden of proof is on the chip makers. I think they have not convinced the critics yet.
This is why I switched to audio books. Many podcasts with real guests contained too much salad and not enough meat (e.g. a machine learning podcast but they talked about going to conferences).
Contrary to what many think, I believe AI generated content can increase the nutritional value. I've done experiments with turning technical PDFs into podcasts, e.g. summarizing machine learning papers (similar to NotebookLM).