Interesting architecture. For these "large" models, I'm interested in synthesis,...

motoxpro · on April 28, 2024

What your evaluating is not what you think it is. You're evaluating the models ability to execute multiple complex steps (think about all of the steps it takes for your second example) not so much if it is capable of doing those things. If you broke it down into 2-3 different prompts it could do all of those things easy.

themanmaran · on April 24, 2024

to be fair, gpt did a pretty good job at the otter prompt

``` \ A love story about two otters, Otty and Lutra

: init ( -- ) CR ." Two lonely otters lived by a great river." ;

: meet ( -- ) CR ." One sunny day, Otty and Lutra met during a playful swim." ;

: play ( -- ) CR ." They splashed, dived, and chased each other joyfully." ;

...continued ```

vessenes · on April 24, 2024

BTW, I wouldn't rate that very high in that it's trying to put out syntactic FORTH, but not defining verbs or other things which themselves tell the story.

Gemini is significantly better last I checked.