You tried Gemini 1.5 or just 1.0? I got an invite to try 1.5 Pro which they said...

sema4hacker · on March 1, 2024

>a personal benchmark of 10 questions that resemble common tasks

That is an idea worth expanding on. Someone should develop a "standard" public list of 100 (or more) questions/tasks against which any AI version can be tested to see what the program's current "score" is (although some scoring might have to assign a subjective evaluation when pass/fail isn't clear).

jprete · on March 2, 2024

Thats what a benchmark is, and they're all gamed by everyone training models, even if they don't intend to, because the benchmarks are in the training data.

The advantage of a personal set of questions is that you might be able to keep it out of the training set, if you don't publish it anywhere, and if you make sure cloud-accessed model providers aren't logging the conversations.

a_wild_dandan · on March 1, 2024

Gemini 1.0 Pro < Gemini 1.5 Pro < Gemini 1.0 Ultra < GPT-4V

GPT-4V is still the king. But Google's latest widely available offering (1.5 Pro) is close, if benchmarks indicate capability (questionable). Gemini's writing is evidently better, and vastly more so its context window.

nebula8804 · on March 1, 2024

Its nice to have some more potentially viable competition. Gemini has better OCR capabilities but its computation abilities seem to fall short....so I have it do the work with the OCR and then move the remainder of the work to GPT4 :)