Hacker News new | past | comments | ask | show | jobs | submit | deepsquirrelnet's comments login

Having come up through hard sciences, I don’t ever know what to do with these kinds of books. If they were written as memoirs, then that’s a different story. That’s all they usually are, but they’re presented in more of an educational/instructional context, yet are devoid of rigor.

I’ve had ‘Running Things’ on my shelf for quite a while, but just don’t feel compelled to read it. To me it’s just a weird genre. Slightly dishonest or something.


Yep, was there 20 years ago. The secret is having first class universities and a stable, thriving economy that attracts bright students. American exceptionalism is that momentum.

The brightest people in the world want to come here and contribute to our economy and spend the money they earn here at your businesses. That’s a privilege, and it’s too bad we don’t value it more.


> The brightest people in the world want to come here

Not any more. Not the women, the queer, poc...


Maybe once all of this is a bit more mature we can just get down to the minimal subset of features that are really important.

I’d love a nvim plugin that is more or less just a split chat window that makes it easy to paste code I’ve yanked (like yank to chat) add my commentary and maybe easily attach other files for context. That’s it really.


I can highly recommend gp.nvim, it has a few features but by default it's just a chat window with a yank-to-chat function. It also supports a context file that gets pasted into every chat automatically (for telling the AI about the tools you use etc)

Last time I used it, Avante was pretty much nailing what you are describing.

https://github.com/yetone/avante.nvim



That is the dream! Would love someone to create a vim plugin for this, if not I'll do it myself if there is enough demand.

QAT “quantization aware training” means they had it quantized to 4 bits during training rather than after training in full or half precision. It’s supposedly a higher quality, but unfortunately they don’t show any comparisons between QAT and post-training quantization.


I understand that, but the qat models (1) are not new uploads.

How is this more significant now than when they were uploaded 2 weeks ago?

Are we expecting new models? I don’t understand the timing. This post feels like it’s two weeks late.

[1] - https://huggingface.co/collections/google/gemma-3-qat-67ee61...


The official announcement of the QAT models happened on Friday 18th, two days ago. It looks like they uploaded them to HF in advance of that announcement: https://developers.googleblog.com/en/gemma-3-quantized-aware...

The partnership with Ollama and MLX and LM Studio and llama.cpp was revealed in that announcement, which made the models a lot easier for people to use.


8 days is closer to 1 week then 2. And it’s a blog post, nobody owes you realtime updates.


https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/t...

> 17 days ago

Anywaaay...

I'm literally asking, quite honestly, if this is just an 'after the fact' update literally weeks later, that they uploaded a bunch of models, or if there is something more significant about this I'm missing.


Hi! Omar from the Gemma team here.

Last time we only released the quantized GGUFs. Only llama.cpp users could use it (+ Ollama, but without vision).

Now, we released the unquantized checkpoints, so anyone can quantize themselves and use in their favorite tools, including Ollama with vision, MLX, LM Studio, etc. MLX folks also found that the model worked decently with 3 bits compared to naive 3-bit, so by releasing the unquantized checkpoints we allow further experimentation and research.

TL;DR. One was a release in a specific format/tool, we followed-up with a full release of artifacts that enable the community to do much more.


Hey Omar, is there any chance that Gemma 3 might get a speech (ASR/AST/TTS) release?


Probably the former... I see your confusion but it's really only a couple weeks at most. The news cycle is strong in you, grasshopper :)


I think you should look at “in-brand” correlation. My hypothesis is that they would undergo similar preference trainings and hence tend to prefer “in-brand” responses over “off-brand” models that might have more significantly different reward training.


> Why couldn't this tariff strategy be implemented in a more calculated, predictable, slow way to give everyone time to adjust, at home and abroad. What's to gain from doing it this way over the one I mentioned ?

To me, the more concerning detail is that they don’t have any messaging about what their strategy is, except what appears to be guessing from various members of the administration — often which is mutually exclusive with other messaging.

The real risk of all that is people are going to sit on the cash they have, unwilling to make purchases when they can’t see the risk in their future.

My wife and I want to install new flooring, but that’s off the table for us right now, until we have a better idea where this is going.

^ those decisions are disastrous for the economy en masse


yep. I liquidated my TFSA ( tax free savings account in Canada ) 2 weeks ago. 90k. Invested in risk-free 90 days GICs instead. No way I would have watched it lose 10-20% of its value in 2 weeks


If this were indeed the case, I’m not sure tariffs are the right direction to do it.

Principally, that a company would need more guarantees on their capital investment that extend beyond a single president’s term. Since tariffs can be revoked at any moment, companies will want more assurances that the long path of capital investment will be worthwhile, or else they might find themselves disadvantaged before even getting off the ground.

They would find themselves lobbying the government to hold back the free market for them within a few years time.


This is pretty cool. I have a similar model that’s 8 days into training on msmarco.

So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.

https://huggingface.co/datasets/dleemiller/lm25


What kind of hardware setup would be needed to replicate the paper’s results?


I am training phi-4 (14B) using a single A6000. There’s some tricks you have to use to keep VRAM consumption down - mainly LoRA and quantization.

There’s a package called “unsloth” that integrates with huggingface’s TRL library that can help.


Yep, this is true. I was poking around on their github and they have examples in their “cookbooks” section. Eg:

https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/ocr...


I tried a lot of models on openrouter recently, and I have to say that I found Gemini 2.0 flash to be surprisingly useful.

I’d never used one of Google’s proprietary models before that, but it really hits a sweet spot in the quality vs latency space right now.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: