PaulPauls's comments

PaulPauls · 2024-11-22T18:53:58 1732301638

Thank you, I too am a big believer and enjoyer of open research. The actual code has clarity that complex research papers were never able to convey to me as well as the actual code could.

Regarding the cost I would probably sum it up to round about ~2.5k USD for just the actual execution cost. Development cost would've probably doubled that sum if I wouldn't already have a GPU workstation for experiments at home that I take for granted. That cost is made up of:

* ~400 USD for ~2 months of storage and traffic of 7.4 TB (3.2 TB of raw, 3.2 TB of preprocessed training data) on a GCP standard bucket

* ~100 USD for Anthropic claude requests for experimenting with the right system prompt and test runs and the actual final execution

* The other ~2k USD were used to rent 8x Nvidia RTX4090's together with a 5TB SSD from runpod.io for various stages of the experiments. For the actual SAE training I rented the node for 8 days straight and I would probably allocate an additional ~3-4 days of runtime just to experiments to determine the best Hyperparameters for training.

PaulPauls · 2024-11-22T18:35:43 1732300543

Thank you very much!

I included a section at the bottom that provides a sample bibtex citation. I didn't expect this much attention so I didn't even bother with a License but I'll include a MIT license later today and release 0.2.1

PaulPauls · 2024-11-22T18:18:17 1732299497

Just bad wording from me, trying to combine too much information in 1 sentence. The auxiliary loss is supposed to prevent dead latents from occuring in the first place - therefore "prevent dead latents" - and it is also supposed to revive the latents that are already dead - therefore "revive dead latents".

Now that I review that sentence again I see that I used 2 verbs on the same subject that could be interpreted differently depending on the verb. Me culpa. I hope you still gained some insights into it =)

narmiouh · 2024-11-22T22:08:29 1732313309

Thanks for sharing! It is certainly interesting to me who is not in the mainstream, I'm sure your intended audience understood what you were saying.

PaulPauls · 2024-11-22T01:15:27 1732238127

I'm very happy you appreciate it - particularly the documentation. Writing the documentation was much harder for me than writing the code so I'm happy it is appreciated. I furthermore downloaded your paper and will read through it tomorrow morning - thank you for sharing it!

PaulPauls · 2024-11-22T01:09:30 1732237770

So evaluating SAEs - determining which SAE is better at creating the most unique features while being as sparse as possible at the same time - is a very complex topic that is very much at the heart of the current research into LLM interpretability through SAEs.

Assuming you already solved the problem of finding multiple perfect SAE architectures and you trained them to perfection (very much an interesting ML engineering problem that this SAE project attempts to solve) then deciding on which SAE is better comes down to which SAE performs better on the metrics of your automated interpretability methodology. Particularly OpenAI's methodology emphasizes this automated interpretability at scale utilizing a lot of technical metrics upon which the SAEs can be scored _and thereby evaluated_.

Since determining the best metrics and methodology is such an open research question that I could've experimented on for a few additional months, have I instead opted for a simple approach in this first release. I am talking about my and OpenAI's methodology and the differences between both in chapter 4. Interpretability Analysis [1] in my Implementation Details & Results section. I can also recommend reading the OpenAI paper directly or visiting Anthropics transformer-circuits.pub website that often publishes smaller blog posts on exactly this topic.

[1] https://github.com/PaulPauls/llama3_interpretability_sae#4-i... [2] https://transformer-circuits.pub/

curious_cat_163 · 2024-11-22T04:04:03 1732248243

Thanks!

PaulPauls · 2024-11-22T00:44:39 1732236279

Not sure yet to be honest. I'll definitely consider it but I'll reorient myself and what I plan to do next in the coming week. I also planned on maybe starting a simpler project and maybe showing people how to create the full model of a current Llama 3.2 implementation from scratch in pure PyTorch. I love building things from teh ground up and when I looked for documentation for the Llama 3.2 background section of this SAE project then the existing documentation I found was either too superficial or outdated and intended for Llama 1 or 2 - Documentation in ML gets outdated so quickly nowadays...

PaulPauls · 2024-11-22T00:36:54 1732235814

Thank you for saying that! I have a much, much harder time documenting everything and writing out each decision in continuous text than actually writing the code. So it took a look time for me to write all of this down - so I'm happy you appreciate it! =)