Super helpful feedback, thanks for going so deep! I agree that for the really heavy agentic stuff, the router in it's current form might not be the most important innovation.
However, for several use cases speed is really paramount, and can directly hinder the UX. Examples include sales call agents, copilots, auto-complete engines etc. These are some of the areas where we've seen the router really shine, diverting to slow models when absolutely necesary on complex prompts, but using fast models as often as possible to minimize disruption to the UX.
Having said that, another major benefit of the platform is the ability to quickly run objective benchmarks for quality, cost and speed across all models and providers, on your own prompts [https://youtu.be/PO4r6ek8U6M]. We have some users who run benchmarks regularly for different checkpoints of their fine-tuned model, comparing against all other custom fine-tuned models, as well as the various foundation models.
As for the overlap in routing concepts you mentioned, I've thought a lot about this actually. It's our intention to broaden the kinds of routing we're able to handle, where we assume all control flow decision (routing) and intermediate prompts are latent variables (DSPy perspective). In the immediate future there is not crossover though.
I agree cost is often an afterthought. Generally our users either care about improving speed, or they want to know which model or combination of models would be best for their task in terms of output quality (GPT-4, Opus, Gemini? etc.). This is not trivial to guage without performing benchmarks.
As for usually wanting to make a full LLM switch as opposed to routing, what's the primary motivation? Avoiding extra complexity + dependencies in the stack? Perhaps worrying about model-specific prompts no longer working well with a new model? The general loss of control?
However, for several use cases speed is really paramount, and can directly hinder the UX. Examples include sales call agents, copilots, auto-complete engines etc. These are some of the areas where we've seen the router really shine, diverting to slow models when absolutely necesary on complex prompts, but using fast models as often as possible to minimize disruption to the UX.
Having said that, another major benefit of the platform is the ability to quickly run objective benchmarks for quality, cost and speed across all models and providers, on your own prompts [https://youtu.be/PO4r6ek8U6M]. We have some users who run benchmarks regularly for different checkpoints of their fine-tuned model, comparing against all other custom fine-tuned models, as well as the various foundation models.
As for the overlap in routing concepts you mentioned, I've thought a lot about this actually. It's our intention to broaden the kinds of routing we're able to handle, where we assume all control flow decision (routing) and intermediate prompts are latent variables (DSPy perspective). In the immediate future there is not crossover though.
I agree cost is often an afterthought. Generally our users either care about improving speed, or they want to know which model or combination of models would be best for their task in terms of output quality (GPT-4, Opus, Gemini? etc.). This is not trivial to guage without performing benchmarks.
As for usually wanting to make a full LLM switch as opposed to routing, what's the primary motivation? Avoiding extra complexity + dependencies in the stack? Perhaps worrying about model-specific prompts no longer working well with a new model? The general loss of control?