What would be the result if the task was given to multiple models? Instead of alloying them together and switching between models in the same chat, just let the models try to complete the task in their own isolated context, and use the result that completed it successfully?
I would say that that’s at least something the alloying should be benchmarked against, which I didn’t find in the article.
It's not a team vs individually, it's specifically a team/duo with similar or same model vs a team/duo with different models. The benefit is seen by having the models be different. Each finds unique things and enhances the other.
Yeah its like a team where the task is switched between developers. In the end everybody provides different point of view to the problem and the whole team learns about the codebase.
I would say that that’s at least something the alloying should be benchmarked against, which I didn’t find in the article.