Hacker Newsnew | past | comments | ask | show | jobs | submit | more bri3d's commentslogin

They have two products:

* Subscription plans, which are (probably) subsidized and definitely oversubscribed (ie, 100% of subscribers could not use 100% of their tokens 100% of the time).

* Wholesale tokens, which are (probably) profitable.

If you try to use one product as the other product, it breaks their assumptions and business model.

I don't really see how this is weaponized malaise; capacity planning and some form of over-subscription is a widely accepted thing in every industry and product in the universe?


I am curious to see how this will pan out long-term. Is the quality gap of Opus-4.5 over GPT-5.2 large enough to overcome the fact that OpenAI has merged these two bullet points into one? I think Anthropic might have bet on no other frontier lab daring to disconnect their subscription from their in-house coding agent and OpenAI called their bluff to get some free marketing following Anthropic's crackdown on OpenCode.


It will also be interesting to see which model is more sustainable once the money fire subsidy musical chairs start to shake out; it all depends on how many whales there are in both directions I think (subscription customers using more than expected vs large buys of profitable API tokens).


So, if I rent out my bike to you for an hour a day for really cheap money and I do so a 50 more times to 50 others, so that my bike is oversubscribed and you and others don't get your hours, that's OK because it is just capacity planning on my side and widely accepted? Good to know.


Let me introduce you to Citibike?

Also, this is more like "I sell a service called take a bike to the grocery store" with a clause in the contract saying "only ride the bike to the grocery store." I do this because I am assuming that most users will ride the bike to the grocery store 1 mile away a few times a week, so they will remain available, even though there is an off chance that some customers will ride laps to the store 24/7. However, I also sell a separate, more expensive service called Bikes By the Hour.

My customers suddenly start using the grocery store plan to ride to a pub 15 miles away, so I kick them off of the grocery store plan and make them buy Bikes By the Hour.


As others pointed out, every business that sells capacity does this, including your ISP provider.

They could, of course, price your 10GB plan under the assumption that you would max out your connection 24 hours a day.

I fail to see how this would be advantageous to the vast majority of the customers.


Well, if the service price were in any way tied to the cost of transmitting bytes, then even the 24hr scenarios would likely see a reduction in cost to customers. Instead we have overage fees and data caps to help with "network congestion", which tells us all how little they think of their customers.


Yes, correct. Essentially every single industry and tool which rents out capacity of any system or service does this. Your ISP does this. The airline does this. Cruise lines. Cloud computing environments. Restaurants. Rental cars. The list is endless.


I have some bad news for you about your home internet connection.


Hackaday, and you, omit the last sentence in the comment they quote: “The compromise is going all in contained areas where alternatives exist.”


I hope that's true, and remains true.

I mean, this is not exactly the most important issue in the world. I love KDE and hope it remains usable to me, but if things keep going as a fear, there are other DEs that I can live with.


For what it's worth, generally in the US you can still buy a limited time subscription to the manufacturer's own diagnostic software for a reasonable price, for example BMW ISTA is $32/day, VW ODIS is $130/week, and Ford FDRS is $50 / 2 days. This is enough to complete most module replacement or upgrade tasks for a hobbyist, and still cheaper than dealership labor costs.

There's also a standard for the dongles (which specifies a DLL export interface from a driver, amusingly) called J2534 so you don't need a separate hardware interface for each make, although to your point, the way the laws around J2534 were written was too lax and some manufacturers have realized that there is a loophole where only certain diagnostic tasks like module reflashing need to be possible over J2534.

Also worth noting that reverse engineered software has generally not been majorly threatened by manufacturers in the automotive space; Forscan for Ford, VCDS and OBD11 for VW, and so on are all quite popular.

Unfortunately "security" restrictions especially in the EU and the uprise of ADAS systems has made things a lot harder; most makes now have some online challenge/response cryptography (ie VW SFD) for diagnostics where previously they had offline login, and most ADAS and camera systems require extremely expensive calibration jigs (this is a valid technical problem, but with no incentive to reduce cost or make these systems accessible, they end up being comically expensive).

Anyway the situation in automotive is way better than the situation in equipment and ag, so I don't think it's entirely fair to say that regulation was a complete failure.


Thank you! Everything you say is accurate and matches up with my experience. The connectors and stuff you can purchase as a regular consumer barely offer the ability to read or clear diagnostic codes and swapping almost any part requires a specialized connector or thing to connect to. For example, doing even the most basic sensor replacement or heaven forbid a ECM swap. I am thinking of scenarios where Grandma calls and has a Kia where a basic sensor is malfunctioning. You'll be paying out the nose to do Kia authorized things that only Kia will let you do, and the change of that being under $1000 is virtually zero.


A KDS subscription is $30 / 3 days.

https://kiatechinfo.snapon.com/J2534DiagnosticsAndProgrammin...

They claim that only an expensive J2534 interface is "recommended" (a weasely way to get around compliance requirements, although J2534 is also a terrible standard and frequently not compatible) but based on what I've read, the Kia software (KDS) is really simple and even cheap J2534 cables like a GD101 ($30) will work fine.

At this point you have the same software as the dealer would have, so any sensor related issues should be solved (besides ADAS, which like I mentioned is definitely a problem due to the need for calibration jigs). You could swap in a new ECM this way. Not even a dealer can swap a used ECM on some Kias; although in this case there are reverse engineered reflashing tools that work in most cases, too (this is kind of an intentional gap in right to repair, especially in Europe - there's a strong drive to eliminate the used car control modules market because it is heavily associated with organized crime).

I really don't find auto diagnostics to be as sinister as people think they are, or the regulation a complete failure. You can, due in large part to regulation, wade into the morass of horrible dealership diagnostics software if you want, for a limited entry fee.


Can you read the article a little more closely?

> - MiniMax can't fit on an iPhone.

They asked MiniMax on their computer to make an iPhone app that didn't work.

It didn't work using the Apple Intelligence API. So then:

* They asked Minimax to use MLX instead. It didn't work.

* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.

* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.

> Better to dig in a bit more.

The author already did 100% of the digging and then some.

Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.


Emptied out post, thanks for the insight!

Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.

EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.


> Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.

No, the claim is their particular device has a hardware defect that causes MLX not to work (which includes Apple Intelligence).

> EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.

Your comment originally read:

> This is blinkered.

> - MiniMax can't fit on an iPhone.

> - There's no reason to expect models to share OOMs for output.

> - It is likely this is a graceful failure mode for the model being far too large.

> No fan of Apple's NIH syndrome, or it manifested as MLX.

> I'm also no fan of "I told the robot [vibecoded] to hammer a banana into an apple. [do something impossible]. The result is inedible. Let me post to HN with the title 'My thousand dollars of fruits can't be food' [the result I have has ~nothing to do with the fruits]"

> Better to dig in a bit more.

Rather than erase it, and invite exactly the kind of misreading you don't want, you can leave it... honestly, transparently... with your admission in the replies below. And it won't be downvoted as much as when you're trying to manipulate / make requests of others to try to minimize your downvotes. Weird... voting... manipulating... stuff, like that, tends to be frowned upon on HN.

You have more HN karma than I do, even, so why care so much about downvotes...

If you really want to disown something you consider a terrible mistake, you can email the HN mods to ask for the comment to be dissociated from your account. Then future downvotes won't affect your karma. I did this once.


Oh no, all my meaningless internet points, gone!


Then future downvotes won't affect your karma.

Who cares? The max amount of karma loss is 4 points, we can afford to eat our downvotes like adults.


Huh. I thought the minimum comment score was -4 (which would make the maximum amount of karma loss 5, since each comment starts at 1 point), but I didn't know if that was a cap on karma loss or just a cap on comment score.


45 years of writing code, and I still commit fence post errors. :eyeroll:


I'm an AI rage enthusiast too. Feel free to downvote me for free.


If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.


> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.

They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.

Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.

But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.


> Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.

https://ia800806.us.archive.org/20/items/TheFeelingOfPower/T...

I should think I'll probably see someone posting this on the front page of HN tomorrow, no doubt. I first read it when it was already enormously old, possibly nearly 30 years old, in the mid 1980s when I was about 11 or 12 and starting high school, and voraciously reading all the Golden Age Sci-Fi I could lay my grubby wee hands on. I still think about it, often.


Dongles were extremely widely used in the 1990s and early 2000s; for anything more advanced than consumer software you'd almost expect them? Almost every DAW, video editor, high-end compiler, engineering/CAD package, or 3D suite used them, certainly.

I think sometime in the late 1990s FlexLM switched from dongles to "hardware identifiers" that were easily spoofed; honestly I don't think this was a terrible idea since to this article's conclusion, if you could reverse one you could reverse the other.

But this concept was insanely prevalent for ~20 years or so.

One of the biggest problems was not having enough ports. Some parallel port dongles tried to ignore communication with other dongles and actually had a port on the back; you'd make a "dongle snake" out of them. Once they moved to USB it was both easier and harder - you couldn't make the snake anymore, but you could ask people to use a hub when they ran out of ports.


P-CAD even had a dongle-caddy where you could plug in I think about 7 of them into to unlock different modules.

I will check if I can find an image of it.

EDIT: here is an old listing of it: https://www.ebay.com/itm/187748130737

Sadly the lid isn't open so you can't see what modules are installed.


> I think sometime in the late 1990s FlexLM switched from dongles to "hardware identifiers" that were easily spoofed; honestly I don't think this was a terrible idea since to this article's conclusion ...

Starting in '97 I worked on some software that used Elan License Manager (elmd) that then moved on to FlexLM in a major release.

Requests for, and problems with, licensing were a considerable source of support tickets but I'm sure it also drove a reasonable amount of sales as customers wanted to play with component X but were prevented from doing so by a lack of license.

When we were acquired by IBM we replaced the licensing code with lawyers and (threats of) audits. It didn't seem to harm the revenue. The product is still being maintained and sold.

> ... if you could reverse one you could reverse the other.

I can confirm it was quite easy with gdb to either skip past the license checks or, in the case of Elan licensing at least, call the license generation function from within the binary to generate whatever licenses for whatever features you liked.

The "hardware identifiers" were a nightmare too. I ended up writing some code that would pull all of the necessary information (primary MAC, IP address, hostid for Sparc machines, hostname, etc) and give it to us in a base64 encoded blob, we also grabbed some CPU and memory information that proved quite useful in seeing how the software was deployed.


Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.

But, what got me about this is that:

* every other Apple device delivered the same results

* Apple's own LLM silently failed on this device

to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.


> floating point accumulation doesn't commute

It is commutative (except for NaN). It isn't associative though.


I think it commutes even when one or both inputs are NaN? The output is always NaN.


NaNs are distinguishable. /Which/ NaN you get doesn't commute.


I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.


> Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.

Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.


> Many runtimes (e.g. for Javascript) use NaN boxing.

But I've never seen them depend on those NaNs surviving the FPU. Hell, they could use the same trick on bit patterns that overlap with valid float values if they really wanted to.


Can you show me where in the ieee spec this is guaranteed?

My understanding is the exact opposite - that it allows implementations to return any NaN value at all. It need not be any that were inputs.

It may be that JavaScript relies on it and that has become more binding than the actual spec, but I don't think the spec actually guarantees this.

Edit: actually it turns out nan-boxing does not involve arithmetic, which is why it works. I think my original point stands, if you are doing something that relies on how bit values of NaNs are propagated during arithmetic, you are on shaky ground.


See 6.2.3 in the 2019 standard.

> 6.2.3 NaN propagation

> An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.

> If two or more inputs are NaN, then the payload of the resulting NaN should be identical to the payload of one of the input NaNs if representable in the destination format. This standard does not specify which of the input NaNs will provide the payload.


As the comment below notes, the language should means it is recommended, but not required. And there are indeed platforms that do not implement the recommendation.


Oh right sorry. That is confusing.


Don't have the spec handy, but specifically binary operations combining two NaN inputs must result in one of the input NaNs. For all of Intel SSE, AMD SSE, PowerPC, and ARM, the left hand operand is returned if both are signaling or both or quiet. x87 does weird things (but when doesn't it?), and ARM does weird things when mixing signaling and quiet NaNs.


I also don't have access to the spec, but the people writing Rust do and they claim this: "IEEE makes almost no guarantees about the sign and payload bits of the NaN"

https://rust-lang.github.io/rfcs/3514-float-semantics.html

See also this section of wikipedia https://en.wikipedia.org/wiki/NaN#Canonical_NaN

"On RISC-V, most floating-point operations only ever generate the canonical NaN, even if a NaN is given as the operand (the payload is not propagated)."

And from the same article:

"IEEE 754-2008 recommends, but does not require, propagation of the NaN payload." (Emphasis mine)

I call bullshit on the statement "specifically binary operations combining two NaN inputs must result in one of the input NaNs." It is definitely not in the spec.


Blame the long and confusing language in spec:

> For an operation with quiet NaN inputs, other than maximum and minimum operations, if a floating-point result is to be delivered the result shall be a quiet NaN which should be one of the input NaNs.

The same document say:

> shall -- indicates mandatory requirements strictly to be followed in order to conform to the standard and from which no deviation is permitted (“shall” means “is required to”)

> should -- indicates that among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain course of action is deprecated but not prohibited (“should” means “is recommended to”)

i.e. It required to be a quiet NaN, and recommended to use one of the input NaN.


Thanks for the direct evidence that the output NaN is not required to be one of the input NaNs.


Unless you compile with fast-math ofc, because then the compiler will assume that NaN never occurs in the program.


I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.

There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:

1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.

2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.

So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.

Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.

[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...


Even the exact same source code compiled with different compilers, or the same compiler with different compiler options.

Intel Compiler for e.g. uses less than IEEE764 precision for floating point ops by default, for example.


FYI, the saying is "champing at the bit", it comes from horses being restrained.


Huh. I never knew "champing" was the proper spelling [0]

[0] https://www.npr.org/sections/memmos/2016/06/09/605796769/che...


hey, I appreciate your love of language and sharing with us.

I'm wondering if we couldn't re-think "bit" to the computer science usage instead of the thing that goes in the horse's mouth, and what it would mean for an AI agent to "champ at the bit"?

What new sayings will we want?


Byting at the bit?


chomping at the bit


Actually it was originally "champing" – to grind or gnash teeth. The "chomping" (to bite) alternative cropped up more recently as people misheard and misunderstood, but it's generally accepted as an alternative now.


I see


It’s actually accepted as the primary now and telling people about “champing” is just seen as archaic.


Do you have a source on this, or a definition for what it means to be "primary" here? All I can find is sources confirming that "champing" is the original and more technically correct, but that "chomping" is an accepted variant.


As a sister comment said, floating point computations are commutative, but not associative.

a * b = b * a for all "normal" floating point numbers.


In fairness, the decompiler didn't work on the protection method :)

I think that both halves of the author's thesis are true: I bet that you could use this device in a more complicated way, but I also bet that the authors of the program deemed this sufficient. I've reversed a lot of software (both professionally and not) from that era and I'd say at least 90% of it really is "that easy," so there's nothing you're missing!


I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.

I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: