I read somewhere that ryzen AI 370 chip can run gemma 3 14b at 7 tokens/second, so I would expect the performance to be somewhere in that range for llama 4 scout with 17b active
Thanks for this, but I'm still unable to reproduce the results from Google AI studio.
I tried your version and when I ask it to create a tetris game in python, the resulting file has syntax errors. I see strange things like a space in the middle of a variable name/reference or weird spacing in the code output.
Can you share the all the recommended settings to run this LLM? It is clear that the performance is very good when running on AI studio. If possible, I'd like to use the all the same settings (temp, top-k, top-p, etc) on Ollama. AI studio only shows Temperature, top-p and output length.
Same experience here: On AI Studio, this is easily one of the strongest models I have used, including when compared to proprietary LLMs.
But ollama and openwebui performance is very bad, even when running the FP16 version. I also tried to mirror some of AI studio settings (temp 1 and top p 0.95) but couldn't get it to produce anything useful.
I suspect there's some bug in the ollama releases (possibly wrong conversation delimiters?). If this is fixed, I will definitely start using Gemma 3 27b as my main model.
Update: Unsloth is recommending a temperature of 0.1, not 1.0, if using Ollama. I don’t know why Ollama would require a 10x lower value, but it definitely helped. I also read some speculation that there might be an issue with the tokenizer.
My usual non-scientific benchmark is asking it to implement the game Tetris in python, and then iterating with the LLM to fix/tweak it.
My prompt to Gemma 27b (q4) on open webui + ollama: "Can you create the game tetris in python?"
It immediately starts writing code. After the code is finished, I noticed something very strange, it starts a paragraph like this:
"
Key improvements and explanations:
Clearer Code Structure: The code is now organized into a Tetris class, making it much more maintainable and readable. This is essential for any non-trivial game.
"
Followed by a bunch of fixes/improvements, as if this was not the first iteration of the script.
I also notice a very obvious error: In the `if __name__ == '__main__':` block, it tries to instantiate a `Tetris` class, when the name of the class it created was "TetrisGame".
Nevertheless, I try to run it and paste the `NameError: name 'Tetris' is not defined` error along with stack trace specifying the line. Gemma then gives me this response:
"The error message "NameError: name 'Tetris' is not defined" means that the Python interpreter cannot find a class or function named Tetris. This usually happens when:"
Then continues with a generic explanation with how to fix this error in arbitrary programs. It seems like it completely ignored the code it just wrote.
I ran the same prompt on google AI studio it had the same behavior of talking about improvements as if the code it wrote was not the first version.
Other than that, the experience was completely different:
- The game worked on first try
- I iterated with the model making enhancements. The first version worked but didn't show scores, levels or next piece, so I asked it to implement those features. It then produced a new version which almost worked: The only problem was that levels were increasing whenever a piece fell, and I didn't notice any increase in falling speed.
- So I reported the problems with level tracking and falling speed and it produced a new version which crashed immediately. I pasted the error and it was able to fix it in the next version
- I kept iterating with the model, fixing issues until it finally produced a perfectly working tetris game which I played and eventually lost due to high falling speed.
- As a final request, I asked it to port the latest working version of the game to JS/HTML with the implementation self contained in a file. It produced a broken implementation, but I was able to fix it after tweaking it a little bit.
Gemma 3 27b on Google AI studio is easily one of the best LLMs I've used for coding.
Unfortuantely I can't seem to reproduce the same results in ollama/open webui, even when running the full fp16 version.
I checked this, the whole conversation was about 1000 tokens.
I suspect the Ollama version might have wrong default settings, such as conversation delimiters. The experience of Gemma 3 in AI studio is completely different.
Been trying the 109b version on Groq and it seems less capable than Gemma 3 27b
reply