The price really is eye watering. At a glance, my first impression is this is something like Llama 3.1 405B, where the primary value may be realized in generating high quality synthetic data for training rather than direct use.
I keep a little google spreadsheet with some charts to help visualize the landscape at a glance in terms of capability/price/throughput, bringing in the various index scores as they become available. Hope folks find it useful, feel free to copy and claim as your own.
That's a nice sentiment, but I'd encourage you to add a license or something. The basic "something" would be adding a canonical URL into the spreadsheet itself somewhere, along with a notification that users can do what they want other than removing that URL. (And the URL would be described as "the original source" or something, not a claim that the particular version/incarnation someone is looking at is the same as what is at that URL.)
The risk is that someone will accidentally introduce errors or unsupportable claims, and people with the modified spreadsheet won't know that it's not The spreadsheet and so will discount its accuracy or trustability. (If people are trying to deceive others into thinking it's the original, they'll remove the notice, but that's a different problem.) It would be a shame for people to lose faith in your work because of crap that other people do that you have no say in.
Not just for training data, but for eval data. If you can spend a few grand on really good labels for benchmarking your attempts at making something feasible work, that’s also super handy.
hey, thank you! bubble charts, annotated with text and shapes using the Drawing tool. Working with the constraints of Google Sheets is its own challenge.
also - love the podcast, one of my favorites. the 3:1 io token price breakdown in my sheet is lifted directly from charts I've seen on latent space.
What gets me is the whole cost structure is based on practically free services due to all the investor money. They’re not pulling in significant revenue with this pricing relative to what it costs to train the models, so the cost may be completely different if they had to recoup those costs, right?
Hey, just FYI, I pasted your url from the spreadsheet title into Safari on macOS and got an SSL warning. Unfortunately I clicked through and now it works, so not sure what the exact cause looked like.
Nice, thank you for that (upvoted in appreciation). Regarding the absence of o1-Pro from the analysis, is that just because there isn't enough public information available?
I keep a little google spreadsheet with some charts to help visualize the landscape at a glance in terms of capability/price/throughput, bringing in the various index scores as they become available. Hope folks find it useful, feel free to copy and claim as your own.
https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...