Hacker Newsnew | past | comments | ask | show | jobs | submit | westoncb's commentslogin

Interesting that compaction is done using an encrypted message that "preserves the model's latent understanding of the original conversation":

> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.


Their compaction endpoint is far and away the best in the industry. Claude's has to be dead last.


Help me understand, how is a compaction endpoint not just a Prompt + json_dump of the message history? I would understand if the prompt was the secret sauce, but you make it sound like there is more to a compaction system than just a clever prompt?


They could be operating in latent space entirely maybe? It seems plausible to me that you can just operate on the embedding of the conversation and treat it as an optimization / compression problem.


Yes, Codex compaction is in the latent space (as confirmed in the article):

> the Responses API has evolved to support a special /responses/compact endpoint [...] it returns an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation


Is this what they mean by "encryption" - as in "no human-readable text"? Or are they actually encrypting the compaction outputs before sending them back to the client? If so, why?


"encrypted_content" is just a poorly worded variable name that indicates the content of that "item" should be treated as an opaque foreign key. No actual encryption (in the cryptographic sense) is involved.


This is not correct, encrypted content is in fact encrypted content. For openai to be able to support ZDR there needs to be a way for you to store reasoning content client side without being able to see the actual tokens. The tokens need to stay secret because it often contains reasoning related to safety and instruction following. So openai gives it to you encrypted and keeps the keys for decrypting on their side so it can be re-rendered into tokens when given to the model.

There is also another reason, to prevent some attacks related to injecting things in reasoning blocks. Anthropic has published some studies on this. By using encrypted content, openai and rely on it not being modified. Openai and anthropic have started to validate that you're not removing these messages between requests in certain modes like extended thinking for safety and performance reasons


Are you sure? For reasoning, encrypted_content is for sure actually encrypted.


Hmmm, no, I don't know this for sure. In my testing, the /compact endpoint seems to work almost too well for large/complex conversations, and it feels like it cannot contain the entire latent space, so I assumed it keeps pointers inside it (ala previous_response_id). On the other hand, OpenAI says it's stateless and compatible with Zero Data Retention, so maybe it can contain everything.


They say they do not compress the user messages, but yeah, it's purpose is to do very lossy compression of everything else. I'd expect it to be small.


Ah, that makes more sense. Thanks!


Their models are specifically trained for their tools. For example the `apply_patch` tool. You would think it's just another file editing tool, but its unique diff format is trained into their models. It also works better than the generic file editing tools implemented in other clients. I can also confirm their compaction is best in class. I've imlemented my own client using their API and gpt-5.2 can work for hours and process millions of input tokens very effectively.


Maybe it's a model fine tuned for compaction?


Yes, agree completely.


Is it possible to use the compactor endpoint independently? I have my own agent loop i maintain for my domain specific use case. We built a compaction system, but I imagine this is better performance.


Yes you can and I really like it as a feature. But it ties you to OpenAI…


I would guess you can if you're using their Responses api for inference within your agent.


How does this work for other models that aren’t OpenAI models


It wouldn’t work for other models if it’s encoded in a latent representation of their own models.



That depends on the content of the SVGs.. Of course you can write a script to do a very literally kind of conversion of regardless, but in practice a lot of interpretation would be required, and could be done by an LLM. Simple case is an SVG that's a static presentation of a button; the intended React component could handle hover and click states and change the cursor appropriately and set aria label etc. For anything but trivial cases a script isn't going to get you far.


That's about how it came across for me as well: ignoring my actual content and joking about generalizations related to key words.

Project is cool overall, love the xkcd-like comic idea—but prompting and/or model-selection could use some work. I'd like to take a crack at tuning it myself :)


It sounds more like you just made an overly simplistic interpretation of their statement, "everything works like I think it should," since it's clear from their post that they recognize the difference between some basic level of "working" and a well-engineered system.

Hopefully you aren't discouraged by this, observationist, pretty clear hansmayer is just taking potshots. Your first paragraph could very well have been written by a professional SWE who understood what level of robustness was required given the constraints of the specific scenario in which the software was being developed.


I've been on a break from coding for about a month but was last working on a new kind of "uncertainty reducing" hierarchical agent management system. I have a writeup of the project here: https://symbolflux.com/working-group-foundations.html


> So, they found an underlying commonality among the post-training structures in 50 LLaMA3-8B models, 177 GPT-2 models, and 8 Flan-T5 models; and, they demonstrated that the commonality could in every case be substituted for those in the original models with no loss of function; and noted that they seem to be the first to discover this.

Could someone clarify what this means in practice? If there is a 'commonality' why would substituting it do anything? Like if there's some subset of weights X found in all these models, how would substituting X with X be useful?

I see how this could be useful in principle (and obviously it's very interesting), but not clear on how it works in practice. Could you e.g. train new models with that weight subset initialized to this universal set? And how 'universal' is it? Just for like like models of certain sizes and architectures, or in some way more durable than that?


It might we worth it to use that subset to initialize the weights of future models but more importantly you could save a huge number of computational cycles by using the lower dimensional weights at the time of inference.


Ah interesting, I missed that possibility. Digging a little more though my understanding is that what's universal is a shared basis in weight space, and particular models of same architecture can express their specific weights via coefficients in a lower-dimensional subspace using that universal basis (so we get weight compression, simplified param search). But it also sounds like to what extent there will be gains during inference is in the air?

Key point being: the parameters might be picked off a lower dimensional manifold (in weight space), but this doesn't imply that lower-rank activation space operators will be found. So translation to inference-time isn't clear.


My understanding differs and I might be wrong. Here's what I inferred:

Let's say you finetune a Mistral-7B. Now, there are hundreds of other fine-tuned Mistral-7B's, which means it's easy to find the universal subspace U of the weights of all these models combined. You can then decompose the weights of your specific model using U and a coefficient matrix C specific to your model. Then you can convert any operation of the type `out=Wh` to `out=U(C*x)` Both U and C are much smaller dimension that W and so the number of matrix operations as well as the memory required is drastically lower.


Prior to this paper, no one knew that X existed. If this paper proves sound, then now we know that X exists at all.

No matter how large X is, one copy of X baked into the OS / into the silicon / into the GPU / into CUDA, is less than 50+177+8 copies of X baked into every single model. Would that permit future models to be shipped with #include <X.model> as line 1? How much space would that save us? Could X.model be baked into chip silicon so that we can just take it for granted as we would the mathlib constant "PI"? Can we hardware-accelerate the X.model component of these models more than we can a generic model, if X proves to be a 'mathematical' constant?

Given a common X, theoretically, training for models could now start from X rather than from 0. The cost of developing X could be brutal; we've never known to measure it before. Thousands of dollars of GPU per complete training at minimum? Between Google, Meta, Apple, and ChatGPT, the world has probably spent a billion dollars recalculating X a million times. In theory, they probably would have spent another billion dollars over the next year calculating X from scratch. Perhaps now they won't have to?

We don't have a lot of "in practice" experience here yet, because this was first published 4 days ago, and so that's why I'm suggesting possible, plausible, ways this could help us in the future. Perhaps the authors are mistaken, or perhaps I'm mistaken, or perhaps we'll find that the human brain has X in it too. As someone who truly loathes today's "AI", and in an alternate timeline would have completed a dual-major CompSci/NeuralNet degree in ~2004, I'm extremely excited to have read this paper, and to consider what future discoveries and optimizations could result from it.

EDIT:

Imagine if you had to calculate 3.14159 from basic principles every single time you wanted to use pi in your program. Draw a circle to the buffer, measure it, divide it, increase the memory usage of your buffer and resolution of your circle if necessary to get a higher precision pi. Eventually you want pi to a billion digits, so every time your program starts, you calculate pi from scratch to a billion digits. Then, someday, someone realizes that we've all been independently calculating the exact same mathematical constant! Someone publishes Pi: An Encyclopedia (Volume 1 of ∞). It becomes inconceivably easier to render cones and spheres in computer graphics, suddenly! And then someone invents radians, because now that we can map 0..360° onto 0..τ, and no one predicted radians at all but it's incredibly obvious in hindsight.

We take for granted knowledge of things like Pi, but there was a time when we did not know it existed at all. And then for a long time it was 3. And then someone realized the underlying commonality of every circle and defined it plainly, and now we have Pi Day, and Tau Day, because not only do we know it exists, but we can argue about it. How cool is that! So if someone has discovered a new 'constant', then that's always a day of celebration in my book, because it means that we're about to see not only things we consider "possible, but difficult" to instead be "so easy that we celebrate their existence with a holiday", but also things that we could never have remotely dreamed of before we knew that X existed at all.

(In less tangible analogies, see also: postfix notation which was repeatedly invented for decades (by e.g. Dijkstra) as a programming advance, or the movie "Arrival" (2019) as a linguistic advance, or the BLIT Parrot (don't look!) as a biological advance. :)


If even remotely fact what you suggest here, I see two antipodal trajectories the authors secretly huddled and voted on:

1. As John Napier, who freely, generously, gifted his `Mirifici' for the benefit of all.

2. Here we go, patent trolls, have at it. OpenAI, et al burning midnight oil to grab as much real estate on this to erase any (even future?) debt stress, deprecating the AGI Philospher's Stone to first owning everything conceivable from a new miraculous `my precious' ring, not `open', closed.


I think the idea is like: it took extra work 'cause Rust makes you be so explicit about allocations and types, but it's also probably faster/more reliable because that work was done.

Of course at the end of the day it's just marketing and doesn't necessarily mean anything. In my experience the average piece of Rust software does seem to be of higher quality though..


Even forgetting the memory safety and async safety guarantees, the language design produces lower defect code by a wide margin. Google and other orgs have written papers about this.

There are no exceptions. There are no nulls. You're encouraged to return explicit errors. No weird error flags or booleans or unexpected ways of handling abnormal behaviors. It's all standardized. Then the language syntax makes it easy to handle and super ergonomic and pleasurable. It's nice to handle errors in Rust. Fully first class.

Result<T,E>, Option<T>, match, if let, if let Ok, if let Some, while let, `?`, map, map_err, ok_or, ok_or_else, etc. etc. It's all super ergonomic. The language makes this one of its chief concerns, and writing idiomatic Rust encourages you to handle errors smartly.

Because errors were so well thought out, you write fewer bugs.

Finally, the way the language makes you manage scope, it's almost impossible to write complicated nesting or difficult to follow logic. Hard to describe this one unless you have experience writing Rust, but it's a big contributor to high quality code.

Rust code is highly readable and easy to reason about (once you learn the syntax). There are no surprises with Rust. It's written simply and straightforwardly and does what it says on the tin.


Thats not special to rust in any way or form. Most of mentioned features are stolen from ML, and in some cases badly. Eg rust has unwrap thats basically a ticking time bomb waiting to blow up. Rust has many other ways to blow up the program. Its not only about memory safety (80% of rust apps in the wild dont benefit from "memory safety" in any way or form).


> Most of mentioned features are stolen from ML

Rust is the most popular ML-derived language, so if some is considering Rust vs. some other language the chances are the other one they're considering does not have all the ML goodies.


Okay, but the alternative isn't ML; virtually all of this software would otherwise be written in C or C++.


IMHO every app benefits from memory safety.

Memory safety doesn't only have security implications, but reduces crashes, misbehavior and corrupt data.

You don't want either in any software, which has to fulfill a task in a productive way.


Sure, but a GC gives you that. Having manual memory management (like in C) is something only a very few applications REALLY need. Hell i have seen web frontends written in rust. Now the circle is complete.


1. It doesn't give you that necessarily, see Go.

2. Rust doesn't have memory management like in C. In Rust, abstractions and the compiler manage memory for you, except when you opt into C-like memory management using unsafe.

3. The comment was about memory safety, not memory management, and its benefit.

4. In case of GC vs manual memory management was used as a speed comparison: You might not REALLY need the speed of Rust, but I gladly take it where I can. I am tired of sluggish resource hogging electron apps and similar. Electron probably destroyed at least 10-15 years of progress in hardware performance gains.


Electron is on a totally diffrent level. Im talking small compact programs (like rust/go/ocaml etc). Most do not need the 1-2ms faster execution rust provides.


are you for real? Rust most definitely is not highly readable and the language reeks of complexity


Readable is relative; it’s much less readable than some languages, but much more than the ones it’s largely displacing.

I would much rather try and figure out a bug in unfamiliar rust than in unfamiliar cpp.


It's comparable with Java.

It's significantly easier to parse than C++.


Java doesn't have lifetimes. So no.

And C++ can only be beaten by the likes of K, J and brainfuck. Very low bar to clear.


Doing math is not the same as calculating. LLMs can be very useful in doing math; for calculating they are the wrong tool (and even there they can be very useful, but you ask them to use calculating tools, not to do the calculations themselves—both Claude and ChatGPT are set up to do this).

If you're curious, check out how mathematicians like Robert Ghrist or Terence Tao are using LLMs for math research, both have written about it online repeatedly (along with an increasing number of other researchers).

Apart from assisting with research, their ability on e.g. math olympiad problems is periodically measured and objectively rapidly improving, so this isn't just a matter of opinion.


I actually tweeted like a month ago that I was the reason LLMs use em dashes so much lol: https://x.com/Westoncb/status/1961802304698671407


There are quite a few &mdash;es on my WWW site and on StackExchange thanks to me; and I vaguely recall that I might even have written one on Wikipedia once. But I am quite happy for you to take the blame for training the LLMs. (-:


lol no problem. In reality though there's kind of a funny story behind it because I suspect the way I ended up using them so much is similar to how ChatGPT did. When I got into writing I studied grammar, then decided to read a bunch of classics and analyze their usage of punctuation in general until I had a good understanding of every bit of it. Then, in order to practice, I'd apply what I learned to anything I was writing at the time whether journal notes, conversations on AIM/IRC etc. That latter step meant I was translating a lot of casual/natural speech into a form that also had a high level of 'correctness'. And if you faithfully translate natural speech into 'correct'ly punctuated sentences, you end up using a lot of em dashes. Because ChatGPT/LLMs are tuned for natural/authentic style, as well as for a high degree of 'correctness,' you get today's state of affairs. Just a theory.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: