Hacker News new | past | comments | ask | show | jobs | submit login

They havent "announced" anything. It would make cost-per-query and latency go through the roof, so its HIGHLY unlikely.



Making a gpt query on every search I'd say that's indeed unlikely. Luckily, most queries are repeated, so caching can work very well.

There are many opportunities there, gpt could potentially be used for common queries to expand the results and even as a way to disambiguate queries. For instance, if I ask it:

"If I make a search query for "go", what are the possible different things I may be looking for?"

I get

> There are many possible things that someone might be looking for when they make a search query for "go". Some possible interpretations of the term "go" include:

>

> The board game "Go"

> The programming language "Go"

> The command "go" or "Go!" as a signal to start or proceed with something

> The verb "go", as in to move or travel from one place to another

> The website "GO", which is a popular search engine

>

> It's also possible that the person making the search query is simply looking for information about the word "go" itself, such as its definition, pronunciation, or usage in different contexts.

there may be better prompts, of course.

Also, they may identify some queries as being gpt friendly and get those through gpt, which they may also augment with a suitable prompt. The thing is, giving the query as is to the GPT model is not the only option. They can certainly be creative with how they ask gpt and interpret the results. They don't have to necessarily even display the gpt response, they can use it to improve the results.


Exactly. I keep reading all this hype and folks don't realize how much compute is spent on each ChatGPT query (I'm sure it'll be optimized over time). There's no free lunch


They announced it to The Information, anonymously, in a leak.


allegedly


Thanks. This will help if Microsoft asks it who to sue.


MS would run their own instance. Can you imagine how much money they're willing to spend to dethrone Google?


Why would it make latency go up? It's incredibly parallelizable just copy the weights on more machines.


>It's incredibly parallelizable

No, it isn't. It generates 1 token at a time in a loop until the response is finished. It's a highly serialized task. Parallelization increases the throughput of how many queries can be processed simultaneously, but you wouldn't be able to speed up a single query.


Ah the ol’ have a baby in a month by impregnating 9 women strategy.


The additional cpu to gpu to cpu round trip time I'm guessing?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: