Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting architecture. For these "large" models, I'm interested in synthesis, fluidity, conceptual flexibility.

A sample prompt: "Tell me a love story about two otters, rendered in the FORTH language".

Or: "Here's a whitepaper, write me a simulator in python that lets me see the state of these variables, step by step".

Or: "Here's a tarball of a program. Write a module that does X, in a unified diff."

These are super hard tasks for any LLM I have access to, BTW. Good for testing current edges of capacity.

Arctic does not do great on these, unfortunately. It's not willing to make 'the leap' to be creative in FORTH where creativity = storytelling, and tries to redirect me to either getting a story about otters, or telling me things about FORTH.

Google made a big deal about emergent sophistication in models as they grew in parameter size with the original PaLM paper, and I wonder if these horizontally-scaled MOE of many small models are somehow architecturally limited. The model weights here, 480B, are sized close to the original PaLM model (540B if I recall).

Anyway, more and varied architectures are always welcome! I'd be interested to hear from the Snowflake folks if they think the architecture has additional capacity with more training, or if they think it could improve on recall tasks, but not 'sophistication' type tasks.



What your evaluating is not what you think it is. You're evaluating the models ability to execute multiple complex steps (think about all of the steps it takes for your second example) not so much if it is capable of doing those things. If you broke it down into 2-3 different prompts it could do all of those things easy.


to be fair, gpt did a pretty good job at the otter prompt

``` \ A love story about two otters, Otty and Lutra

: init ( -- ) CR ." Two lonely otters lived by a great river." ;

: meet ( -- ) CR ." One sunny day, Otty and Lutra met during a playful swim." ;

: play ( -- ) CR ." They splashed, dived, and chased each other joyfully." ;

...continued ```


BTW, I wouldn't rate that very high in that it's trying to put out syntactic FORTH, but not defining verbs or other things which themselves tell the story.

Gemini is significantly better last I checked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: