Hacker News new | past | comments | ask | show | jobs | submit | tarruda's comments login

> I'm a little unimpressed by its instruction following

Been trying the 109b version on Groq and it seems less capable than Gemma 3 27b


I read somewhere that ryzen AI 370 chip can run gemma 3 14b at 7 tokens/second, so I would expect the performance to be somewhere in that range for llama 4 scout with 17b active

Llama.com has the blog post

AFAIK CPython doesn't JIT compile.

Recent versions of CPython do have an experimental JIT compiler. I'm not sure how widely-used it is, though.

https://peps.python.org/pep-0744/


It's disabled by default, not production ready. You need to compile CPython yourself if you want to try it.

Thanks for this, but I'm still unable to reproduce the results from Google AI studio.

I tried your version and when I ask it to create a tetris game in python, the resulting file has syntax errors. I see strange things like a space in the middle of a variable name/reference or weird spacing in the code output.


Some models are more sensitive to quantization than others, presumably AI Studio is running the full 16 bit model.

Try maybe the 8bit quant if you have the hardware for it? ollama run hf.co/unsloth/gemma-3-27b-it-GGUF:Q8_0


I tested the full fp16 gguf


Can you share the all the recommended settings to run this LLM? It is clear that the performance is very good when running on AI studio. If possible, I'd like to use the all the same settings (temp, top-k, top-p, etc) on Ollama. AI studio only shows Temperature, top-p and output length.


I tried ollama fp16 and it had the same issues.


Same experience here: On AI Studio, this is easily one of the strongest models I have used, including when compared to proprietary LLMs.

But ollama and openwebui performance is very bad, even when running the FP16 version. I also tried to mirror some of AI studio settings (temp 1 and top p 0.95) but couldn't get it to produce anything useful.

I suspect there's some bug in the ollama releases (possibly wrong conversation delimiters?). If this is fixed, I will definitely start using Gemma 3 27b as my main model.


Update: Unsloth is recommending a temperature of 0.1, not 1.0, if using Ollama. I don’t know why Ollama would require a 10x lower value, but it definitely helped. I also read some speculation that there might be an issue with the tokenizer.


My usual non-scientific benchmark is asking it to implement the game Tetris in python, and then iterating with the LLM to fix/tweak it.

My prompt to Gemma 27b (q4) on open webui + ollama: "Can you create the game tetris in python?"

It immediately starts writing code. After the code is finished, I noticed something very strange, it starts a paragraph like this:

" Key improvements and explanations:

     Clearer Code Structure:  The code is now organized into a Tetris class, making it much more maintainable and readable.  This is essential for any non-trivial game.
"

Followed by a bunch of fixes/improvements, as if this was not the first iteration of the script.

I also notice a very obvious error: In the `if __name__ == '__main__':` block, it tries to instantiate a `Tetris` class, when the name of the class it created was "TetrisGame".

Nevertheless, I try to run it and paste the `NameError: name 'Tetris' is not defined` error along with stack trace specifying the line. Gemma then gives me this response:

"The error message "NameError: name 'Tetris' is not defined" means that the Python interpreter cannot find a class or function named Tetris. This usually happens when:"

Then continues with a generic explanation with how to fix this error in arbitrary programs. It seems like it completely ignored the code it just wrote.


I ran the same prompt on google AI studio it had the same behavior of talking about improvements as if the code it wrote was not the first version.

Other than that, the experience was completely different:

- The game worked on first try

- I iterated with the model making enhancements. The first version worked but didn't show scores, levels or next piece, so I asked it to implement those features. It then produced a new version which almost worked: The only problem was that levels were increasing whenever a piece fell, and I didn't notice any increase in falling speed.

- So I reported the problems with level tracking and falling speed and it produced a new version which crashed immediately. I pasted the error and it was able to fix it in the next version

- I kept iterating with the model, fixing issues until it finally produced a perfectly working tetris game which I played and eventually lost due to high falling speed.

- As a final request, I asked it to port the latest working version of the game to JS/HTML with the implementation self contained in a file. It produced a broken implementation, but I was able to fix it after tweaking it a little bit.

Gemma 3 27b on Google AI studio is easily one of the best LLMs I've used for coding.

Unfortuantely I can't seem to reproduce the same results in ollama/open webui, even when running the full fp16 version.


Those sound like the sort of issues which could be caused by your server silently truncating the middle of your prompts.

By default, Ollama uses a context window size of 2048 tokens.


I checked this, the whole conversation was about 1000 tokens.

I suspect the Ollama version might have wrong default settings, such as conversation delimiters. The experience of Gemma 3 in AI studio is completely different.


Why did this get downvoted? Asking genuinely


In my experience, Gemma models were always bad at coding (but good at other tasks).


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: