You've nerdsniped me so hard that I had to make an account. There are DOZENS of ...

andy99 · on April 24, 2024

Competition isn't a waste of resources, it's the best mechanism we have to ensure quality.

Furthermore, I'm happy to be in a golden age with lots of orgs trying things and many options. It's going to suck once the market eventually consolidates us and we have to take whatever enshittified thing the ologopolists feed us.

hnthrowaway9812 · on April 27, 2024

It's a wast if they are mostly all trying the SAME things. Which is mostly what is happening.

I want someone to spend a million on a Chess LLM so we can get a sense of how sophisticated they can get at non-linguistic pattern matching.

I want someone to spend a million on an LLM trained on Python program traces so we can try to teach it cause and effect and "debugging". Maybe it will emulate a Python interpreter and get highly reliable at predicting the outcome of Python code.

etc.

RhodesianHunter · on April 25, 2024

Gods if this isn't exactly how it'll turn out.

bugbuddy · on April 25, 2024

It’s like cryptocurrency hashing but, now, all the players are large extremely rich corporations. It is gonna be the funniest historical rhyme ever.

hnfong · on April 25, 2024

Very nice! This list is super convenient for LLM “connoisseurs”(?) like me.

Did you have a script to generate it or was it manually done?

lhl · on April 25, 2024

Just spotted this link. Just to clarify, I (not the original poster, although everyone's welcome to share this link, it's a public doc) maintain this list (and the rest of the sheet) manually. While I keep the foundation models that I'm interested in fairly up to date, obviously there are too many fine-tunes/datasets to track now. I started this when LLaMA was first released and I was getting myself up to speed on the LLM landscape.

A group at the CRFM maintains a bigger list of models (their goal is stated for cataloguing foundation models, but it looks like they have some tunes mixed in these days): https://crfm.stanford.edu/ecosystem-graphs/

This site also seems to keep track of models, with more closed/announced models that I don't bother to track: https://lifearchitect.ai/models-table/

hnthrowaway9812 · on April 27, 2024

Thanks for this spreadsheet! It's amazing!

hnfong · on April 25, 2024

Very useful info. Thank you!

mlsu · on April 25, 2024

Interesting! That is more than I thought. Honored to have caused a nerdsnipe.

In the grand scheme of things, though, most of these are quite small -- 7b range. A 7b model is nothing to sneeze at but it's not megacorp resources either. It's in the range of "VC check" size.

The "big boys" who are training 70b plus are FAANG or government-scale entities. Microsoft, Google, and Meta have multiple entries on that "big" LLM foundation list -- it's because the GPUs are already bought, have to train something to keep utilization up. Also bear in mind that training of these things is still something closer to an art than a science; you put terabytes of data into the cauldron, let it brew, and only after it's done can you taste what you've made. Makes sense that some of these models will be junk.