I don't think an alloy can be as good as a larger model in general, though perhaps in special cases it can be.
Say that you want to translate a string from English to language X. Models A and B, having fewer parameters to spare, have less knowledge of language X. Model C, a larger model, has better knowledge of language X. No matter how A and B collude, they will not exceed the performance of model C.
Say that you want to translate a string from English to language X. Models A and B, having fewer parameters to spare, have less knowledge of language X. Model C, a larger model, has better knowledge of language X. No matter how A and B collude, they will not exceed the performance of model C.