Competition isn't a waste of resources, it's the best mechanism we have to ensure quality.
Furthermore, I'm happy to be in a golden age with lots of orgs trying things and many options. It's going to suck once the market eventually consolidates us and we have to take whatever enshittified thing the ologopolists feed us.
It's a wast if they are mostly all trying the SAME things. Which is mostly what is happening.
I want someone to spend a million on a Chess LLM so we can get a sense of how sophisticated they can get at non-linguistic pattern matching.
I want someone to spend a million on an LLM trained on Python program traces so we can try to teach it cause and effect and "debugging". Maybe it will emulate a Python interpreter and get highly reliable at predicting the outcome of Python code.
Just spotted this link. Just to clarify, I (not the original poster, although everyone's welcome to share this link, it's a public doc) maintain this list (and the rest of the sheet) manually. While I keep the foundation models that I'm interested in fairly up to date, obviously there are too many fine-tunes/datasets to track now. I started this when LLaMA was first released and I was getting myself up to speed on the LLM landscape.
A group at the CRFM maintains a bigger list of models (their goal is stated for cataloguing foundation models, but it looks like they have some tunes mixed in these days): https://crfm.stanford.edu/ecosystem-graphs/
Interesting! That is more than I thought. Honored to have caused a nerdsnipe.
In the grand scheme of things, though, most of these are quite small -- 7b range. A 7b model is nothing to sneeze at but it's not megacorp resources either. It's in the range of "VC check" size.
The "big boys" who are training 70b plus are FAANG or government-scale entities. Microsoft, Google, and Meta have multiple entries on that "big" LLM foundation list -- it's because the GPUs are already bought, have to train something to keep utilization up. Also bear in mind that training of these things is still something closer to an art than a science; you put terabytes of data into the cauldron, let it brew, and only after it's done can you taste what you've made. Makes sense that some of these models will be junk.
There are DOZENS of orgs releasing foundational models, not "a handful."
Salesforce, EleuthierAI, NVIDIA, Amazon, Stanford, RedPajama, Cohere, Mistral, MosaicML, Yandex, Huawei StabilityLM, ...
https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...
It's completely bonkers and a huge waste of resources. Most of them will see barely any use at all.