Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When you put aside the hype and anecdotes, generative AI has languished in the same place, even in my kindest estimations, for several months, though it's really been years. The one "big thing" that they've been able to do is to use "reasoning" to make the Large Language Models "think" [...]

This is missing the most interesting changes in generative AI space over the last 18 months:

- Multi-modal: LLMs can consume images, audio and (to an extent) video now. This is a huge improvement on the text-only models of 2023 - it opens up so many new applications for this tech. I use both image and audio models (ChatGPT Advanced Voice) on a daily basis.

- Context lengths. GPT-4 could handle 8,000 tokens. Today's leading models are almost all 100,000+ and the largest handle 1 or 2 million tokens. Again, this makes them far more useful.

- Cost. The good models today are 100x cheaper than the GPT-3 era models and massively more capable.



The "iPhone moment" gets used a lot, but maybe it's more analogous to the early internet: we have the basics, but we're still learning what we can do with this new protocol and building the infrastructure around it to be truly useful. And as you've pointed out, our "bandwidth" is increasing exponentially at the same time.

If nothing else, my workflows as a software developer have changed significantly in these past two years with just what's available today, and there is so much work going into making that workflow far more productive.


But if this is like the internet, it’s not refuting the idea that this is a huge bubble. The internet did have a massive investment bubble.

And I’d argue it took decades to actually achieve some of the things we were promised in the early days of the internet. Some have still not come to fruition (the tech behind end to end encrypted emails was developed decades ago, yet email as most people use it is still ridiculously primitive and janky)


Yes. But this article argues two things at once - that the technology is itself not useful and not used, and that this won't change in the future. And it also argues that this is a bad investment, at least in the form of OpenAI.

I have very little idea of the second - it's totally possible OpenAI is a bad investment. I think this article is massively wrong about the first part though - this is an incredible technology, and this should be evident to everyone (I'm a little shocked we're still having an argument of the form "I'm a world-class developer and this increases my productivity" vs. "no, you're wrong!" on the other).


While there was certainly a software bubble during the early internet, it still took obscene amounts of investments in brand new technologies in the late 90's. Entire datacenters full of hardware modems. In fact, 'datacenters' had to become a thing.

Then came DSL, then came cable, then came fiber. Countless billions of dollars invested into all these different systems.

This AI stuff is something else. Lots of hardware investment, sure, but also lots of software investment. It is becoming so good and so cheap its showing up on every single search engine result.

Anyway, my point is, while there may have been aspects of the early internet being a bubble, there were real dollars chasing real utility, and I think AI is quite similar in that regard.


Can it be an investment bubble but also a hugely promising technology? The FOMO-frothing herd will over-invest in whatever is new and shiny, regardless of its merits?


I recently compared the buildout of data centers for AI to the railway bubbles of the 1800s.

Nobody will deny the importance of railways to the Industrial Revolution, but they also lost a lot of people a lot of money: https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-envi...


> If nothing else, my workflows as a software developer have changed significantly in these past two years with just what's available today, and there is so much work going into making that workflow far more productive.

this is exactly the problem

The more productivity AI brings to workers, the fewer employees employers need to hire, the less salary employers need to pay, and the less money workers have for consumption.

capitalist mode of production


What's your opinion on the productivity boost open source libraries have brought to developers?

Did all of that free code reduce demand for developers? If not, why not?


> Did all of that free code reduce demand for developers?

the anwser is yes, while in the meantime, the expansion of the industry offset the surplus of developers.


I think the answer is that open source made developers more valuable because they could build you a whole lot more functionality for the same amount of effort thanks to not having to constantly reinvent every wheel that they needed.

More effective developers results in more demand for custom software, resulting in more jobs for developers.

My hope is that AI-assisted programming will have similar results.


How likely do you think this is Simon?

I don't really know myself, but I think there's a decent change that most developer jobs will actually disappear. Your argument isn't wrong, but when we're nearing (though still far from) the state where all productive tech work can be handled by LLMs. Once it can effectively and correctly fix bugs and add new well-defined features to a real codebase, things start to look very different for most developers.


Less productivity seems like a worse path.


it depends how you define good or worse

for humanity, the increase in productivity is progress

i'm not saying it's bad, i'm saying it has consequences


> LLMs can consume images,

Not very well in my experience. Last time I checked ChatGPT/DALL-E couldn't understand the its own output to know that what it had drawn was incorrect. Nor could it correct mistakes that were pointed out to it.

For example, I ask it to draw an image of a bike with rim brakes it could not, nor could it "see" that what was wrong with the brakes that it had drawn. For all intents and purposes it was just remixing the images it had been trained on without much understanding.


Generating images and consuming images are very different challenges, which for most models use entirely different systems (ChatGPT constructs prompts to DALL-E for example: https://simonwillison.net/2023/Oct/26/add-a-walrus/ )

Evaluating vision LLMs on their ability to improve their own generation of images doesn't make sense to me. That's why I enjoy torturing new models with my pelican on a bicycle SVG benchmark!


Cost as in, cost to you? Or cost to serve?

If the cost-to-serve is subsidized by VC money, they aren't getting cheaper, they're just leading you on.


I've heard from insiders that AWS Nova and Google Gemini - both incredibly cheap - are still charging more for inference than they spend on the server costs to run a query. Since those are among the cheapest models I expect this is true of OpenAI and Anthropic as well.

The subsidies are going to the training costs. I don't know if any model is running at a profit once training/research costs are included.


As a society we choose to let the excess wealth pile up into the hands of people that are investing to bring about their own utopia.

If we're stretching, we can talk about opportunity cost. But the people spending and creating the "bubble" don't have better opportunities. They're not nations that see a ROI on things like transportation infrastructure or literacy.

So unless the discussion is taken more broadly and higher taxes are on the table, there really isn't a cost or subsidy imo.


The cost to serve.


> Cost as in, cost to you? Or cost to serve?

This. IIUC to serve an LLM is to perform an O(n^2) computation on the model weights for every single character of user input. These models are 40+GB so that means I need to provision about 40GB RAM per concurrent user and perform hundreds of TB worth of computations per query.

How much would I have to charge for this? Are there any products where the users would actually get enough value out of it to pay what it costs?

Compare to the cost of a user session in a normal database backed web app. Even if that session fans out thousands of backend RPCs across a hundred services, each of those calls executes in milliseconds and requires only a fraction of the LLM's RAM. So I can support thousands of concurrent users per node instead of one.


> IIUC to serve an LLM is to perform an O(n^2) computation on the model weights for every single character of user input.

The computations are not O(n^2) in terms of model weights (parameters), but linear. If it were quadratic, the number would be ludicrously large. Like, "it'll take thousands of years to process a single token" large.

(The classic transformers are quadratic on the context length, but that's a much smaller number. And it seems pretty obvious from the increases in context lengths that this is no longer the case in frontier models.)

> These models are 40+GB so that means I need to provision about 40GB RAM per concurrent user

The parameters are static, not mutated during the query. That memory can be shared between the concurrent users. The non-shared per-query memory usage is vastly smaller.

> How much would I have to charge for this?

Empirically, as little as 0.00001 cents per token.

For context, the Bing search API costs 2.5 cents per query.


Ah got it, that's more sensible. So is anyone making money with these things yet?


The efficiency gains over the past 18 months have been incredible. Turns out there was a lot of low hanging fruit to make these things faster, cheaper and more resource efficient. https://simonwillison.net/2024/Dec/31/llms-in-2024/#llm-pric...


Interesting. There's obviously been a precipitous drop in the sticker price, but has there really been a concomitant efficiency increase? It's hard to believe the sticker price these companies are charging has anything to do with reality given how massively they're subsidized (free Azure compute, billions upon billions in cash, etc). Is this efficiency trend real? Do you know of any data demonstrating it?


I have personal anecdotal evidence that they're getting more efficient: I've had the same 64GB M2 laptop for three years now. Back in March 2023 it could just about run LLaMA 1, a rubbish model. Today I'm running Mistral Small 3 on the same hardware and it's giving me a March-2023-GPT-4-era experience and using just 12GB of RAM.

People who I trust in this space have consistently and credibly talked about these constant efficiency gains. I don't think this is a case of selling compute for less than it costs to run.


People are comparing the current rush to the investments made in the early days of the internet while (purposely?) forgetting how expensive access to it was back then. Not saying that AI companies should make a profit today, but I don't see or hear that AI usage is becoming essentials in any way or form.


Yeah that's the big problem. The Internet (e-commerce, specifically) is an obviously good idea. Technologies which facilitate it are profitable because they participate in an ecosystem which is self sustaining. Brick and mortar businesses have something to gain by investing in an online presence. As far as I can tell, there's nothing similar with AI. The speech to text technology in my phone that I'm using right now to write this post is cool but it's not a killer app in the same way that an online shopping cart is.


There's a little grain of salt with respect to context lengths: the number has grown, but performance seems to degrade with larger context windows.

Anecdote:

I often front-load a bunch of package.jsons from a monorepo when making tooling / CI focused changes. Even 10 or 20k tokens in, Claude says things like "we should look at the contents of somepackage/package.json to check the specifics of the `dev` script."

But its already in the context window! Given the reminder (not reloading it, just saying "its in there"), Claude makes the inference it needs for the immediate problem.

This seems to approximate a 'working memory' for the assistant or models themselves. Curious whether the model is imposing this on the assistant as part of its schema for simulating a thoughtful (but fallible) agent, or if the model itself has the limitation.


> This is missing the most interesting changes in generative AI space over the last 18 months

I agree, though personally I'm liking the "big thing" as well. R1 is able to one-shot a lot of work for me, churning away in the background while I do other things.

> Multi-modal

IMO this is still early days and less reliable. What are some of your daily use cases?

> Context lengths

This is the biggest thing IMO (Models remaining coherent at > 32k contexts)

And whatever improvements have caused models like Qwen2.5 to be able to write valid code reliably vs the GPT-4 and earlier days.

There are a whole lot of useful smaller niche projects HF like extracting vocals/drums/piano from music, etc


Multi-modal audio is great. I talk to ChatGPT when I'm cooking or walking the dog.

For images I use it for things like helping draft initial alt text for images, extracting tables from screenshots, translating photos of signs in languages I don't speak - and then really fun stuff like "invent a recipe to recreate this plate of food" or "my CSS renders like this, what should I change?" or "How do you think I turn on this oven?" (in an Airbnb).

I've recently started using the share-screen feature provided for Gemini by https://aistudio.google.com/live when I'm reading academic papers and I want help understanding the math. I can say "What does this symbol with the squiggle above it?" out loud and Gemini will explain it for me - works really well.


Multi-modal was the absolute game-changer.

Just last night I was digging around in my basement, pulling apart my furnace, showing pics of the inside of it, having GPT explain how it works and what I needed to do to fix it.


I would never trust an LLM to do this unless it was pointing me to pages/sections in a real manual or reputable source I could reference.


I admire your optimism that good manuals and reputable sources exist for the average furnace in the average basement.


If there are no reputable sources to point to, then where exactly is GPT deriving its answer from? And how can we be assured GPT is correct about the furnace in question?


I mean.. I fed it all the photos of the unit and every diagram and instruction panels from the thing. I was confident in the information it was giving me about what parts did what and where to look and what to look for. You have to evaluate its output, certainly.

Getting it to fix a mower now. It's surfacing some good YouTube vids.


I use it like that all the time. There's so much information in the world which assumes you have a certain level of understanding already - you can decipher the jargon terms it uses, you can fill in the blanks when it doesn't provide enough detail.

I don't have 100% of the "common sense" knowledge about every field, but good LLMs probably have ~80% of that "common sense" baked in. Which makes them better at interpreting incomplete information than I am.

A couple of examples: a post on some investment forum mentions DCA. A cooking recipe tells me "boil the pasta until done".

I absolutely buy that feeding in a few photos of dusty half-complete manual pages found near my water heater would provide enough context for it to answer questions usefully.


I would accept a link to a YouTube video with a timestamp. Just something connected to the real world.


Oh right, yeah I've done things like this (phone calls to ChatGPT) or the openwebui Whisper -> LLM -> TTS setup. I thought there might be something more than this by now




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: