>OpenAI is losing a brutal amount of money, possibly on every API request you make to them as they might be offering those at a loss (some sort of "platform play", as business dudes might call it, assuming they'll be able to lock in as many API consumers as possible before becoming profitable).
I believe if you take out training costs they aren't losing money on every call on its own, though depends on which model we are talking about. Do you have a source/estimate?
For better or worse, OpenAI removing the capped structure and turning the nonprofit from AGI considerations to just philanthropy feels like the shedding of the last remnants of sanctity.
>Complaint chat models will be trained to start with "Certainly!
They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about.
At any rate, Gemini 2.5pro passes this just fine
>Okay, based on my internal knowledge without performing a new search:
I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater.
>However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars.
There are a few problems with an „I don’t know” sample. For starters, what does it map to? Recall, the corpus consists of information we have (affirmatively). You would need to invent a corpus of false stimuli. What you would have, then, is a model that is writing „I don’t know” based on whether the stimulus better matches something real, or one of the negatives.
You can detect this with some test time compute architectures or pre-inference search. But that’s the broader application. This is a trick for the model alone.
The Chain of Thought in the reasoning models (o3, R1, ...) will actually express some self-doubt and backtrack on ideas. That tells me there's a least some capability for self-doubt in LLMs.
A Poorman's "thinking" hack was to edit the context of the ai reply to where you wanted it to think and truncate it there, and append a carriage return and "Wait..." Then hit generate.
It was expensive because editing context isn't, you have to resend (and it has to re-parse) the entire context.
This was injected into the thinking models, I hope programmatically.
The execute function can recognize it as a t-string and prevent SQL injection if the name is coming from user input. f-strings immediately evaluate to a string, whereas t-strings evaluate to a template object which requires further processing to turn it into a string.
Then the useful part is the extra execute function you have to write (it's not just a substitute like in the comment) and an extra function can confirm the safety of a value going into a f-string just as well.
I get the general case, but even then it seems like an implicit anti-pattern over doing db.execute(f"QUERY WHERE name = {safe(name)}")
Problem with that example is where do you get `safe`? Passing a template into `db.execute` lets the `db` instance handle safety specifically for the backend it's connected to. Otherwise, you'd need to create a `safe` function with a db connection to properly sanitize a string.
And further, if `safe` just returns a string, you still lose out on the ability for `db.execute` to pass the parameter a different way -- you've lost the information that a variable is being interpolated into the string.
db.safe same as the new db.execute with safety checks in it you create for the t-string but yes I can see some benefits (though I'm still not a fan for my own codebases so far) with using the values further or more complex cases than this.
Yeah but it would have to be something like `db.safe("SELECT * FROM table WHERE id = {}", row_id)` instead of `db.execute(t"SELECT * FROM table WHERE id = {row_id}")`.
This is just extra boilerplate though, for what purpose?.
I think one thing you might be missing is that in the t-string version, `db.execute` is not taking a string; a t-string resolves to an object of a particular type. So it is doing your `db.safe` operation, but automatically.
Of course you can write code like that. This is about making it easier not to accidentally cause code injection by forgetting the call of safe(). JavaScript had the same feature and some SQL libraries allow only the passing of template strings, not normal strings, so you can't generate a string with code injection. If you have to dynamically generate queries they allow that a parameter is another template string and then those are merged correctly. It's about reducing the likelihood of making mistakes with fewer key strokes. We could all just write untyped assembly instead and could do it safely by paying really good attention.
agreed. but then you're breaking the existing `db.execute(str)`. if you don't do that, and instead add `db.safe_execute(tpl: Template)`, then you're back to the risk that a user can forget to call the safe function.
also, you're trusting that the library implementer raises a runtime exception if a string a passed where a template is expected. it's not enough to rely on type-checks/linting. and there is probably going to be a temptation to accept `db.execute(sql: Union[str, Template])` because this is non-breaking, and sql without params doesn't need to be templated - so it's breaking some stuff that doesn't need to be broken.
i'm not saying templates aren't a good step forward, just that they're also susceptible to the same problems we have now if not used correctly.
Yeah, you could. I'm just saying that by doing this you're breaking `db.execute` by not allowing it to take it string like it does now. Libraries may not want to add a breaking change for this.
What does db.safe do though? How does it know what is the safe way of escaping at that point of the SQL? It will have no idea whether it’s going inside a string, if it’s in a field name position, denotes a value or a table name.
To illustrate the question further, consider a similar html.safe: f"<a href={html.safe(url)}>{html.safe(desc)</a>" - the two calls to html.safe require completely different escaping, how does it know which to apply?
Some SQL engines support accepting parameters separately so that values get bound to the query once the abstract syntax tree is already built, which is way safer than string escapes shenanigans.
I’d always prefer to use a prepared statement if I can, but sadly that’s also less feasible in the fancy new serverless execution environments where the DB adapter often can’t support them.
For me it just makes it easier to identify as safe, because it might not be obvious at a glance that an interpolated template string is properly sanitised.
> and an extra function can confirm the safety of a value going into a f-string just as well.
Yes, you could require consumers to explicitly sanitize each parameter before it goes into the f-string, or, because it has the structure of what is fixed and what is parameters, it can do all of that for all parameters when it gets a t-string.
The latter is far more reliable, and you can't do it with an f-string because an f-string after creation is just a static string with no information about construction.
> Then the useful part is the extra execute function you have to write
Well, no, the library author writes it. And the library author also gets to detect whether you pass a Template instance as expected, or (erroneously) a string created by whatever formatting method you choose. Having to use `safe(name)` within the f-string loses type information, and risks a greater variety of errors.
Your "old" db.execute (which presumably accepts a regular old string) would not accept a t-string, because it's not a string. In the original example, it's a new db.execute.
Using a t-string in a db.execute which is not compatible with t-strings will result in an error.
Using a t-string in a db-execute which is, should be as safe as using external parameters. And using a non-t-string in that context should (eventually) be rejected.
Yes, but if a function accepts a template (which is a different type of object from a string!), either it is doing sanitization, or it explicitly implemented template support without doing sanitization—hard to do by accident!
The key point here is that a "t-string" isn't a string at all, it's a new kind of literal that's reusing string syntax to create Template objects. That's what makes this new feature fundamentally different from f-strings. Since it's a new type of object, libraries that accept strings will either have to handle it explicitly or raise a TypeError at runtime.
I'm not sure why you think it's harder to use them without sanitization - there is nothing inherent about checking the value in it, it's just a nice use.
You might have implemented the t-string to save the value or log it better or something and not even have thought to check or escape anything and definitely not everything (just how people forget to do that elsewhere).
I really think you're misunderstanding the feature. If a method has a signature like:
class DB:
def execute(query: Template):
...
It would be weird for the implementation to just concatenate everything in the template together into a string without doing any processing of the template parameters. If you wanted an unprocessed string, you would just have the parameter be a string.
I'm not. Again, you might be processing the variable for logging or saving or passing elsewhere as well or many other reasons unrelated to sanitization.
Taking a Template parameter into a database library's `execute` method is a big bright billboard level hint that the method is going to process the template parameters with the intent to make the query safe. The documentation will also describe the behavior.
You're right that the authors of such libraries could choose to do something different with the template parameter. But none of them will, for normal interface design reasons.
A library author could also write an implementation of a `plus` function on a numerical type that takes another numerical type, and return a string with the two numbers concatenated, rather than adding them together.
But nobody will do that, because libraries with extremely surprising behavior like that won't get used by anybody, and library authors don't want to write useless libraries. This is the same.
It's true that in theory `db.execute` could ignore semantics and concatenate together the template and variables to make a string without doing any sanitisation, but isn't the same true of the syntax it was claimed to replace?
Just because templates (or the previous syntax of passing in variables separately) could be used in a way that's equivalent safety-wise to an f-string by a poorly designed library does not mean that they add nothing over an f-string in general - they move the interpolation into db.execute where it can do its own sanitization and, realistically, sqlite3 and other libraries explicitly updated to take these will use it to do proper sanitization.
Having errors is not the user error - Google will also return you bad results but I'd still consider it user error if someone can't avoid the bad results well enough to find some use for it.
> It just so happens that sometimes that non-deterministic text aligns with reality, but you don’t really know when and neither does the model.
This is overly simplistic and demonstratably false - there's plenty of scenarios where a model will sell something false on purpose (e.g. when joking) and will tell you it was false with high probability correctly whether it was false or not after that.
However you want to frame it - there's clearly a more accurate than chance evaluation of truthfulness.
I don’t see how A follows from B. Being able to lie on purpose doesn’t in my mind mean that it’s also able to tell when a statement is true or false. The first one is just telling a tale which they are good at
The model has only a linguistic representation of what is "true" or "false"; you don't. This is a limitation of LLMs, human minds have more to it than NLP
A couple of years of this LLM AI hype train has blinded people to what was actually surprising about LLMs. The surprise wasn't that you could make a language model and it wasn't that a language model could generate text. Those are both rather pedestrian observations, and their implementations are trivial. The surprise of LLMs was that contemporary hardware could scale this far and that an un-curated training set turns out to contain a statistically significant amount of truth. Deep learning was interesting because we didn't expect that amount of computation to be feasible at this time in human history, not because nobody had ever thought of it before.
The surprise of the LLM AI was that they were somewhat truthful at all.
The AI revolution has mostly been a hardware revolution. I studied AI in the 1990s, so I knew about neural networks and backpropagation, so when suddenly everybody was talking about "Deep Learning", I wanted to know what was different about it. Turns out: not much. It's mostly just plain old backpropagation on a much larger scale because we have more powerful hardware.
Of course there have still been plenty of meaningful innovations, like the transformer/attention thing, but it's mostly the fact that affordable graphics cars offer massively-parallel floating point calculations which turns out to be exactly what we need to scale this up. That and the sheer amount of data that's become available in the age of the Internet.
>The AI revolution has mostly been a hardware revolution.
It's certainly important but this reads as overly simplistic to me. All the hardware we have today won't make an SVM or a random forest scale the way transformers do.
I get it but for what is worth, the accepted behavior here is that you can repost eventually but you should wait significantly longer than a day before doing so.
> Are reposts ok?
>If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.
Weeeell, technically a fresh new story hasn't got any attention in the last year or so because it was never posted before, so a small number of reposts - like in this case - should be ok.
Seems like the resulting attention on the repost makes for decent justification in this case. I'm glad to have seen this, and I don't like the idea of good content slipping through the cracks because of timing and circumstance.
Note that there is a “second chance” pool of posts which did not get much attention but are perhaps more interesting to the community than the engagement suggests. The mods seem to agree with this point, given that.
I can relate and it is something I'm thinking about as I have a post I'd like to try reposting without coming off as spammy. In this case, the repost was indeed worth it as far as I can tell.
I can empathize. Part of what we require seems to be better detection and signaling of which accounts are most and least likely to be human but I'm not sure if we'll get that in the biggest forums.
LLMs can practically pass the Turing test in this context so on one hand, this should become worse, but on the other hand we are not that far from where the LLM comments are about as worth as the random real ones anyway. And if you want more than this level, you have to curate better.
I believe if you take out training costs they aren't losing money on every call on its own, though depends on which model we are talking about. Do you have a source/estimate?