Hacker Newsnew | past | comments | ask | show | jobs | submit | tosh's commentslogin

For the TPC-DS results it would also have been nice to show how the macbook neo compares to the AWS instances.

Or am I missing something?


Indeed, it would have been interesting but I really wanted to get the blog post out on the launch day of the MacBook Neo and did not have the bandwidth to run additional cloud experiments.

I ran TPC-DS SF300 now on the c6a.4xlarge. It turns out that it's still quite limited by the EBS disk's IO: while 32 GB memory is much more than 8 GB, DuckDB needs to spill to disk a lot and this shows on the runtimes. Running all 99 queries took 37 minutes, so about half of the MacBook's 79 minutes.

> Command being timed: "duckdb tpcds-sf300.db -f bench.sql"

> Percent of CPU this job got: 250%

> Elapsed (wall clock) time (h:mm:ss or m:ss): 37:00.96

> Maximum resident set size (kbytes): 25559652


a bit less capable but ~comparable to qwen 3.5 122b

~ 2x faster inference than qwen 3.5 122b

~ 7x faster inference than gpt-oss 120b

probably most important: training datasets and training recipe available (!)

in other words this is an open source llm release (not just open weights!)


Oh. That's nice. Thanks for sharing this in the comments.

aged very well

Tony Hoare on how he came up with Quicksort:

he read the algol 60 report (Naur, McCarthy, Perlis, …)

and that described "recursion"

=> aaah!

https://www.youtube.com/watch?v=pJgKYn0lcno


SSD is 2x faster read/write

And yet: current state of the art models are also great at navigating and trying language ecosystems that aren't as mainstream. So if you're curious it's now great to explore topics, languages, concepts that — even if not mainstream — were so far a bit out of reach.

I remember these Macbooks did tend to break apart at the corners of the palmrests.

But I like the idea of re-visiting Macbook plastic chassis w/ new inside.

I would love to know what the weight is in the end.

Can the old Macbook chassis lead to a lighter weight computer than the current 1.23kg Macbook neo and Macbook air?


> I remember these Macbooks did tend to break apart at the corners of the palmrests.

Not the corners for me, but the "feet" of the topcase digging into the palmrest, which would splinter the plastic, then you'd have holes in the case and jagged plastic splinters digging into your wrist as you typed, not enjoyable.

This: https://ismh.s3.amazonaws.com/2014-02-24-macbook-topcase.jpg is exactly what mine had, on both sides.

Shame because it was the last macbook that was really easy to upgrade: the battery was removable (with a simple lock), and behind it were the RAM and 2.5" drive slots.

The next generation was not that hard but you had to unscrew the entire bottom shell, and the battery was glued.


Unscrewing the bottom on the generations after this gave you access to nearly everything. Which was vastly superior for most repairs. Getting to the logic board or AirPort card on the polycarbonate MacBook took significantly longer. For the Bluetooth motherboard you had to remove the display cable, optical drive and HDD.

Mine had been upgraded from 4GB of RAM to 8GB, and I replaced the HDD with an SSD, and replaced the DVD drive with the original HDD for more storage. Was a nice machine for uni, I really loved it.

Same issue with mine, and I came in the comments to see how many people were affected. I’m very surprised OP don’t have the problem with his unit.

That’s what happened to my 2006 Core Duo MacBook after about three or four years of use. It was an excellent laptop that was quite user-serviceable (I upgraded the RAM and hard drive), but I did have problems with the palmrests, and the Ethernet port stopped working after four years.

It was my first Apple laptop and I have fond memories of using it during my college years.


I had one of those machines in university too and had the same stained/cracked palmrests. That said, I also paid for extended AppleCare and had the whole top case swapped for free multiple times throughout the three years that the coverage lasted.

When I was a broke student I would buy MacBooks with broken palm rests for a discounted price, drop them off at Apple for a free repair (under extended warranty) and flip them for a profit. Three hours of my time turned into €100 profit. Minimum wage was €6/hour back then.

Did the same years later buying up first gen iPod Nano and trading them in for sixth gen because of the battery recall.


The plastic by the trackpad would turn pink as well from my sweaty hands. Good times.

From all those long sessions playing Call of Duty and Quake 4!

Incredible when you consider that the next one or two generations of the macbook neo will probably come w/ 16gb+ ram and support 5k displays.

A few more generations and we might see < 1kg, 120hz oled and multi day battery life.

But I'm most excited about the near future because if the macbook neo becomes a huge success it will hopefully encourage app devs to waste less ram.


> hopefully encourage app devs to waste less ram.

Plenty of devices with limited RAM existed before this and we didn't see devs cater to them. I highly doubt this temporary spike in memory prices is going to cause a long lasting change in behavior.



Unsloth have just released benchmarks on how their dynamic quants perform for Qwen 3.5

https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks


I'm aware of that, but that's not the link of the post. The post is linking to their UD 2.0 quants from a few months back.

Also, the benchmarks are because they messed up the first version of their Qwen 3.5 XL quants by quanting some tensors to mxfp4 that should have been in higher quality, and this is their bugfix. The post literally starts out with "We updated Qwen3.5-35B Unsloth Dynamic quants being SOTA on nearly all bits" without explaining WHY they needed to update from the original version.


Didn't expect this to be on HN haha - but sometimes HN does have older posts come up sometimes.

No your conclusion is false - only the old Q4_K_XL had slightly higher perplexity, all other quants are fine. We uploaded 9TB of research artifacts to https://huggingface.co/unsloth/Qwen3.5-35B-A3B-Experiments-G... for the community.

If you read our blog, it says KLD and PPL are actually sometimes counterintuitive - for example MiniMax some of our quants do worse on PPL and KLD vs AesSedai's one for example, but does worse on LiveCodeBench by a lot see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-3-...

This is because see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-1-... - although bitwidths are in general monotonic ie q2_k < q3_k < q4_k < q5_k etc, we find KLD and PPL are actually not monotonic ie q3_k can actually have BETTER PPL than q4_k.

So the main point is bad luck on quantization - sometimes lower bits might get lower PPL and KLD, but actually this is a ruse and wrong, since on actual real world tasks, it's worse.


The Q4_K_XL is easily the most popular quant for the model, though.

So then why was Q4_K_XL having issues? Is it just a PPL issue that doesn't reflect in real world usage? If yes, why not just say that? "The Q4_K_XL had lower PPL, but don't worry, PPL can be wrong, and other benchmarks show it's fine". If it was a real quality issue, then where was the issue caused by?

The blog post says "Retiring MXFP4 from all GGUF quants: Q2_K_XL, Q3_K_XL and Q4_K_XL, except for pure MXFP4_MOE" but doesn't say why. The easy assumption that most people would make is "oh, you quanted attention or ssn or something to mxfp4 and that turned out to be bad, so you retire mxfp4" but if you say that it's not that, then what's the actual issue?


each layer is made up of various weights, the weights are adjusted to quant it. a pure q8 will have all the weights as q8, or a q4 the same. but some are kept as f32, etc. here's an example of q3_k_xl - https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF/tree/ma... we can see certain weights are f32, q8, q5, q3, etc. They used mxfp4 in some weights and mxfp4 doesn't seem to place nicely in quants so that's why they are retiring it. read their publication again and it should make more sense.

I am aware of all that.

They literally never say “they used mxfp4 in some weights”. What you’re claiming they said doesn’t exist.

This isn’t a postmortem, it’s PR fluff without actually addressing the issue.


It's right there https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks I looked at the weights before. It's not PR fluff, they made it clear by showing how it really affected various tensors terribly.

"MXFP4 is much worse on many tensors - attn_gate, attn_q, ssm_beta, ssm_alpha using MXFP4 is not a good idea, and rather Q4_K is better - also MXFP4 uses 4.25 bits per weight, whilst Q4_K uses 4.5 bits per weight. It's better to use Q4_K than MXFP4 when choosing between them."

The Q4 quants had a mixture of mxfp4 leading to worse outcomes.


Nope. Where do they say something along the lines of "we had MXFP4 tensors in our previous upload" or "that's why we re-uploaded new versions"?

This is a famous non-apology non-explanation of what actually happened. "They made it clear by showing how it really affected various tensors terribly"? Where do they even say they had ever previously uploaded any quant with MXFP4?


Looking at their benchmarks there doesn't appear to be meaningful difference between their quants and bartowsky quants.

No our Qwen3.5 new ones show the opposite see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

Am I misreading the table?

  Unsloth Q4_K_M

  PPL:       6.6053     KLD 99.9%: 0.5478     KLD mean: 0.0192

  bartowski Qwen_Q4_K_M

  PPL:       6.6097     KLD 99.9%: 0.5771     KLD mean: 0.0182

Barely noticeable drop in PPL; noticeable KLD drop (good, 5%); but worse KLD mean (bad, 5%).

You forgot to check the disk sapce - _M and _XL are not the same across quants:

Unsloth Q4_K_M 18.49GB 0.5478 KLD 99.9% 0.0192 mean

Unsloth Q4_K_XL 19.17GB 0.4097 KLD 99.9% 0.0137 mean

bartowski Q4_K_M 19.77GB 0.5771 KLD 99.9% 0.0182 mean


The table doesn't have bartowski Q4_K_XL to compare, but given the metrics of _Ms aren't universally better it's unclear if smaller size doesn't come with a cost.

I’m curious how NVFP4 compares to their Q4.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: