Came here to ask a similar question: who is this targeted to? We see very differ...

danlenton · on May 23, 2024

One use case is optimizing agentic systems, where a custom router [https://youtu.be/9JYqNbIEac0] is trained end-to-end on the final task (rather than GPT4-as-a-judge). Both the intermediate prompts and the models used can then be learned from data (similar to DSPy), whilst ensuring the final task performance remains high. This is not supported with v0, but it's on the roadmap. Thoughts?

qeternity · on May 24, 2024

We do agentic systems. We already optimize for these things. We route between different models based on various heuristics. I absolutely would not want that to be black box. And doing any sort of vector similarity to determine task complexity is not going to work well.

I would also not try to emulate DSPy, which is a massively overrated bit of kit and of little use in a production pipeline.

tarasglek · on May 28, 2024

Curious, can you explain why you think DSPy overrated?

weird-eye-issue · on May 23, 2024

Yeah even for a pure chatbot it doesn't make sense because in our experience users want to choose the exact model they are using...

danlenton · on May 23, 2024

Interesting, do you have any hunch as to why this is? We've seen in more verticalized apps where the underlying model is hidden from the user (sales call agent, autopilot tool, support agent etc.) that trying to reach high quality on hard prompts and high speed on the remaining prompts makes routing an appealing option.

weird-eye-issue · on May 24, 2024

We charge users different amounts of credits based on the model used. They also just generally have a personal preference for each model. Some people love Claude, some hate it, etc

For something like a support agent why couldn't the company just choose a model like GPT-4o and stick with one? Would they really trust some responses going to 3.5 (or similar)?

danlenton · on May 24, 2024

Currently the motivation is mainly speed. For the really easy ones like "hey, how's it going?" or "sorry I didn't hear you, can you repeat?" you can easily send to Llama3 etc. Ofc you could do some clever caching or something, but training a custom router directly on the task to optimize the resultant performance metric doesn't require any manual engineering.

Still, I agree that routing in isolation is not thaaat useful in many LLM domains. I think the usefulness will increase when applying to multi-step agentic systems, and when combining with other optimizations such as end-to-end learning of the intermediate prompts (DSPy etc.)

Thanks again for diving deep, super helpful!