They havent "announced" anything. It would make cost-per-query and latency go th...

aflag · on Jan 7, 2023

Making a gpt query on every search I'd say that's indeed unlikely. Luckily, most queries are repeated, so caching can work very well.

There are many opportunities there, gpt could potentially be used for common queries to expand the results and even as a way to disambiguate queries. For instance, if I ask it:

"If I make a search query for "go", what are the possible different things I may be looking for?"

I get

> There are many possible things that someone might be looking for when they make a search query for "go". Some possible interpretations of the term "go" include:

>

> The board game "Go"

> The programming language "Go"

> The command "go" or "Go!" as a signal to start or proceed with something

> The verb "go", as in to move or travel from one place to another

> The website "GO", which is a popular search engine

>

> It's also possible that the person making the search query is simply looking for information about the word "go" itself, such as its definition, pronunciation, or usage in different contexts.

there may be better prompts, of course.

Also, they may identify some queries as being gpt friendly and get those through gpt, which they may also augment with a suitable prompt. The thing is, giving the query as is to the GPT model is not the only option. They can certainly be creative with how they ask gpt and interpret the results. They don't have to necessarily even display the gpt response, they can use it to improve the results.

gerash · on Jan 8, 2023

Exactly. I keep reading all this hype and folks don't realize how much compute is spent on each ChatGPT query (I'm sure it'll be optimized over time). There's no free lunch

kyleyeats · on Jan 7, 2023

They announced it to The Information, anonymously, in a leak.

supermatt · on Jan 8, 2023

allegedly

kyleyeats · on Jan 9, 2023

Thanks. This will help if Microsoft asks it who to sue.

ranting-moth · on Jan 7, 2023

MS would run their own instance. Can you imagine how much money they're willing to spend to dethrone Google?

naillo · on Jan 7, 2023

Why would it make latency go up? It's incredibly parallelizable just copy the weights on more machines.

charcircuit · on Jan 7, 2023

>It's incredibly parallelizable

No, it isn't. It generates 1 token at a time in a loop until the response is finished. It's a highly serialized task. Parallelization increases the throughput of how many queries can be processed simultaneously, but you wouldn't be able to speed up a single query.

hooloovoo_zoo · on Jan 7, 2023

Ah the ol’ have a baby in a month by impregnating 9 women strategy.

eachro · on Jan 7, 2023

The additional cpu to gpu to cpu round trip time I'm guessing?