When using stuff like this you let the cream of top surface then use it. It will surface when enough people/publication give praise, otherwise you waste time evaluating too many models, it’s not a fun task as they can give you illusions of being great at some but end up with overall worse. Right now gpt4 is king and haven’t been any consistent chatter saying otherwise