| „[…] was a friend telling me his LaTeX thesis took 90 seconds to compile towards the end“
Sure, but in order to iterate you won’t have to compile the whole document but can just keep the chapter you are working on by structuring it with \includes
Curious too. It feels like the maths parts of programming are the hobby (proof checkers, Haskell) and the cranking out Go/JS etc is the paid bit. I studied maths but never used more than high school level at work.
Yeah Hacker Factor's multi-post critiques are where I first saw it analyzed. For reference they run the popular fotoforensics.com image analysis site.
They also have scathing critique (eg [1]) about the Adobe-led C2PA digital provenance signing, having themselves been part of various groups that seek solutions to the provenance problem.
This was a great read, thanks a lot!
One a side note, any one has a good guess what tool/software they used to create the visualisations for matrix multiplications or memory outline?
"[...] modern neural network (NN) architectures have complex designs with many components [...]"
I find the Transformer architecture actually very simple compared to previous models like LSTMs or other recurrent models. You could argue that their vision counterparts like ViT are conceptually maybe even simpler than ConvNets?
Also, can someone explain why they are so keen to remove the skip connections? At least when it comes to coding, nothing is simpler than adding a skip connection and computationally the effect should be marginal?
Skip connection increase the live range of one intermediate result across the whole part of the network skiped:
the tensor at the beginning of a skip connection must be stored in memory for longer while unrelated computation happen: it increase the pressure on the memory hierarchy (either the L2, or scratchpad memory).
This is especially true for example for inference for vision transformers, where it decrease the batch size you can use before hitting the L2 capacity wall.
Okay, I see that for inference. But for training it shouldn't matter because I need to hold on to all my activations for my backwards pass anyways?
But yeah, fair point!
Yes there's very good theoretical reasons for skip connections. If your initial matrix M is noise centered at 0, then 1+M is a noisy identity operation, while 0+M is a noisy deletion... It's better to do nothing if you don't know what to do, and avoid destroying information.
I appreciate the sibling comment perspective that memory pressure is a problem, but that can be mediated by using fewer/longer skip connections across blocks of layers.
hey, off topic but can you explain or link a post which explains what the benefits of the alias -> function definition are over just defining the function directly?
Thanks!
Sure, but in order to iterate you won’t have to compile the whole document but can just keep the chapter you are working on by structuring it with \includes