Indeed, it would have been interesting but I really wanted to get the blog post out on the launch day of the MacBook Neo and did not have the bandwidth to run additional cloud experiments.
I ran TPC-DS SF300 now on the c6a.4xlarge. It turns out that it's still quite limited by the EBS disk's IO: while 32 GB memory is much more than 8 GB, DuckDB needs to spill to disk a lot and this shows on the runtimes. Running all 99 queries took 37 minutes, so about half of the MacBook's 79 minutes.
> Command being timed: "duckdb tpcds-sf300.db -f bench.sql"
> Percent of CPU this job got: 250%
> Elapsed (wall clock) time (h:mm:ss or m:ss): 37:00.96
And yet: current state of the art models are also great at navigating and trying language ecosystems that aren't as mainstream. So if you're curious it's now great to explore topics, languages, concepts that — even if not mainstream — were so far a bit out of reach.
> I remember these Macbooks did tend to break apart at the corners of the palmrests.
Not the corners for me, but the "feet" of the topcase digging into the palmrest, which would splinter the plastic, then you'd have holes in the case and jagged plastic splinters digging into your wrist as you typed, not enjoyable.
Shame because it was the last macbook that was really easy to upgrade: the battery was removable (with a simple lock), and behind it were the RAM and 2.5" drive slots.
The next generation was not that hard but you had to unscrew the entire bottom shell, and the battery was glued.
Unscrewing the bottom on the generations after this gave you access to nearly everything. Which was vastly superior for most repairs. Getting to the logic board or AirPort card on the polycarbonate MacBook took significantly longer. For the Bluetooth motherboard you had to remove the display cable, optical drive and HDD.
Mine had been upgraded from 4GB of RAM to 8GB, and I replaced the HDD with an SSD, and replaced the DVD drive with the original HDD for more storage. Was a nice machine for uni, I really loved it.
That’s what happened to my 2006 Core Duo MacBook after about three or four years of use. It was an excellent laptop that was quite user-serviceable (I upgraded the RAM and hard drive), but I did have problems with the palmrests, and the Ethernet port stopped working after four years.
It was my first Apple laptop and I have fond memories of using it during my college years.
I had one of those machines in university too and had the same stained/cracked palmrests. That said, I also paid for extended AppleCare and had the whole top case swapped for free multiple times throughout the three years that the coverage lasted.
When I was a broke student I would buy MacBooks with broken palm rests for a discounted price, drop them off at Apple for a free repair (under extended warranty) and flip them for a profit. Three hours of my time turned into €100 profit. Minimum wage was €6/hour back then.
Did the same years later buying up first gen iPod Nano and trading them in for sixth gen because of the battery recall.
Plenty of devices with limited RAM existed before this and we didn't see devs cater to them. I highly doubt this temporary spike in memory prices is going to cause a long lasting change in behavior.
I'm aware of that, but that's not the link of the post. The post is linking to their UD 2.0 quants from a few months back.
Also, the benchmarks are because they messed up the first version of their Qwen 3.5 XL quants by quanting some tensors to mxfp4 that should have been in higher quality, and this is their bugfix. The post literally starts out with "We updated Qwen3.5-35B Unsloth Dynamic quants being SOTA on nearly all bits" without explaining WHY they needed to update from the original version.
If you read our blog, it says KLD and PPL are actually sometimes counterintuitive - for example MiniMax some of our quants do worse on PPL and KLD vs AesSedai's one for example, but does worse on LiveCodeBench by a lot see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-3-...
This is because see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-1-... - although bitwidths are in general monotonic ie q2_k < q3_k < q4_k < q5_k etc, we find KLD and PPL are actually not monotonic ie q3_k can actually have BETTER PPL than q4_k.
So the main point is bad luck on quantization - sometimes lower bits might get lower PPL and KLD, but actually this is a ruse and wrong, since on actual real world tasks, it's worse.
The Q4_K_XL is easily the most popular quant for the model, though.
So then why was Q4_K_XL having issues? Is it just a PPL issue that doesn't reflect in real world usage? If yes, why not just say that? "The Q4_K_XL had lower PPL, but don't worry, PPL can be wrong, and other benchmarks show it's fine". If it was a real quality issue, then where was the issue caused by?
The blog post says "Retiring MXFP4 from all GGUF quants: Q2_K_XL, Q3_K_XL and Q4_K_XL, except for pure MXFP4_MOE" but doesn't say why. The easy assumption that most people would make is "oh, you quanted attention or ssn or something to mxfp4 and that turned out to be bad, so you retire mxfp4" but if you say that it's not that, then what's the actual issue?
each layer is made up of various weights, the weights are adjusted to quant it. a pure q8 will have all the weights as q8, or a q4 the same. but some are kept as f32, etc. here's an example of q3_k_xl - https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF/tree/ma... we can see certain weights are f32, q8, q5, q3, etc. They used mxfp4 in some weights and mxfp4 doesn't seem to place nicely in quants so that's why they are retiring it. read their publication again and it should make more sense.
"MXFP4 is much worse on many tensors - attn_gate, attn_q, ssm_beta, ssm_alpha using MXFP4 is not a good idea, and rather Q4_K is better - also MXFP4 uses 4.25 bits per weight, whilst Q4_K uses 4.5 bits per weight. It's better to use Q4_K than MXFP4 when choosing between them."
The Q4 quants had a mixture of mxfp4 leading to worse outcomes.
Nope. Where do they say something along the lines of "we had MXFP4 tensors in our previous upload" or "that's why we re-uploaded new versions"?
This is a famous non-apology non-explanation of what actually happened. "They made it clear by showing how it really affected various tensors terribly"? Where do they even say they had ever previously uploaded any quant with MXFP4?
The table doesn't have bartowski Q4_K_XL to compare, but given the metrics of _Ms aren't universally better it's unclear if smaller size doesn't come with a cost.
Or am I missing something?
reply