I've noticed that LLMs generated text often has spaces around em dashes, which I found odd. They don't always do that, but they do it often enough that it stood out to me since that isn't what you'd normally see.
> Authorities, citing a “foregone conclusion exception” to the Fifth Amendment, argued that Rawls could not invoke his right to self-incrimination because police already had evidence of a crime. The 3rd Circuit panel agreed, upholding a lower court decision.
I do not follow the logic here, what does that even mean? It seems very dubious. And what happens if one legitimately forgets? They just get to keep you there forever?
This is an area that seems to confuse a lot of people because of what the 5th amendment says and doesn't say.
The reason they can't force you to unlock your phone is not because your phone contains evidence of stuff. They have a warrant to get that evidence. You do not have a right to prevent them from getting it just because it's yours. Most evidence is self-incriminating in this way - if you have a murder weapon in your pocket with blood on it, and the police lawfully stop you and take it, you really are incriminating yourself in one sense by giving it to them, but not in the 5th amendment sense.
The right against self-incrimination is mostly about being forced to give testimonial evidence against yourself. That is, it's mostly about you being forced to testify against yourself under oath, or otherwise give evidence that is testimonial in nature against yourself. In the case of passwords, courts often view it now as you being forced to disclose the contents of your mind (IE live testify against yourself) and equally important, even if not live testimony against yourself, it testimonially proves that you have access to the phone (more on this in a second). Biometrics are a weird state, with some courts finding it like passwords/pins, and some finding it just a physical fact with no testimonial component at all other than proving your ability to access.
The foregone conclusion part comes into play because, excluding being forced to disclose the contents of your mind for a second, the testimonial evidence you are being forced to give when you unlock a phone is that you have access to the phone. If they can already prove it's your phone or that you have access to it, then unlocking it does not matter from a testimonial standpoint, and courts will often require you to do so in the jurisdictions that don't consider any other part of unlocking to be testimonial.
(Similarly, if they can't prove you have access to the phone, and whether you have access to the phone or not matters to the case in a material way, they generally will not be able to force you to unlock it or try to unlock it because it woudl be a 5th amendment violation).
> excluding being forced to disclose the contents of your mind for a second
This seems like a key point though. What's the legal distinction between compelling someone to unlock a phone using information in their mind, and compelling them to speak what's in their mind?
If I had incriminating info on my phone at one point, and I memorized it and then deleted it from the phone, now that information is legally protected from being accessed. So it just matters whether the information itself is in your mind, vs. the ability to access it?
There are practical differences - phones store a lot more information that you will keep in your mind at once.
You can actually eliminate phones entirely from your second example.
If you had incriminating info on paper at one point, and memorized it and deleted it, it would now be legally protected from being accessed.
One reason society is okay with this is because most people can't memorize vast troves of information.
Otherwise, the view here would probably change.
These rules exist to serve various goals as best they can. If they no longer serve those goals well, because of technology or whatever else, the rules will change. Being completely logical and self-consistent is not one of these goals, nor would it make sense as a primary goal for rules meant to try to balance societal vs personal rights.
This is, for various reasons, often frustrating to the average HN'er :)
> This is, for various reasons, often frustrating to the average HN'er :)
With that in mind...
> Being completely logical and self-consistent is not one of these goals, nor would it make sense as a primary goal for rules meant to try to balance societal vs personal rights.
Do we really know that it wouldn't make sense, or is that just an assumption because the existing system doesn't do it? (Alternatively, perhaps a consistent logical theory simply hasn't been identified and articulated.)
This reminds me of how "sovereign citizens" argue their position. Their logic isn't consistent, it’s built around rhetorical escape hatches. They'll claim that their vehicle is registered with the federal DOT, which is a commercial registration, but then they'll also claim to be a non-commercial "traveler". They're optimizing for coverage of objections, not global consistency.
What you seem to be telling me is that the prevailing legal system is the same, just perhaps with more of the obvious rough edges smoothed out over the centuries.
It means that if all the other evidence shows that the desired evidence is on the computer, then it is not a question of whether it exists, so youre not really searching for something. Youre retrieving it. That doesn't implicate the 4th amendment.
Unlocking/forced unlocking is not a 4th amendment issue, but a 5th amendment one.
The 4th amendment would protect you from them seizing your phone in the first place for no good reason, but would not protect you from them seizing your phone if they believe it has evidence of a crime.
Regardless, it is not the thing that protects you (or doesn't, depending) from having to give or otherwise type in your passcode/pin/fingerprint/etc.
It depends on the specifics, but I converted a couple of scripts recently that would take minutes to run with Pandas that only took seconds to run with Polars. I was pretty impressed.
I've not found it that great at programming in cobol, at least in comparison to its ability with other languages it seems to be noticeably worse, though we aren't using any models that were specifically trained on cobol. It is still useful for doing simple and tedious tasks, for example constructing a file layout based on info I fed it can be a time saver, otherwise I feel it's pretty limited by the necessary system specifics and really large context window needed to understand what is actually going on in these systems. I do really like being able to feed it a whole manual and let it act as a sort of advanced find. Working in a mainframe environment often requires looking for some obscure info, typically in a large PDF that's not always easy to find what you need, so this is pretty nice.
AI isn’t particularly great with C, Zig, or Rust either in my experience. It can certainly help with snippets of code and elucidate complex bitwise mathematics, and I’ll use it for those tedious tasks. And it’s a great research assistant, helping with referencing documentation. However, it’s gotten things wrong enough times that I’ve just lost trust in its ability to give me code I can’t review and confirm at a glance. Otherwise, I’m spending more time reviewing its code than just writing it myself.
AI is pretty bad at Python and Go as well. It depends a lot on who uses it though. We have a lot of non-developers who make things work with Python. A lot of it will never need a developer because it being bad doesn't matter for what it does. Some of it needs to be basically rewritten from scratch.
Over all I think it's fine.
I do love AI for writing yaml and bicep. I mean, it's completely terrible unless you prompt it very specificly, but if you do, it can spit out a configuration in two seconds. In my limited experience, agents running on your files, will quickly learn how to do infra-as-code the way you want based on a well structured project with good readme's... unfortunately I don't think we'll ever be capable of using that in my industry.
If it's bad at python the most popular language what language it's good at?
If you see the other comments they're basically mentioning most programming languages
Pretty good at Java, the verbose language, strong type system, and strong static analysis tools that you can run on every edit combine to keep it on the tracks you define
Maybe I should have made it more clear, but it's pretty good if you know how to work with it. The issue is that it's usually faster to just read the documentation and write the code yourself. Depending on what you're working on of course. Like with the yaml, a LLM can write you an ingress config in a second or two from a very short prompt. It can do similar things with Python if you specify exactly how you want something and what dependencies you want.
That's being bad at programming in my opinion. You can mitigate it a lot with how you config you agents. Mine loads our tech stack. The best practices we've decided to use. The fact that I value safety first but am otherwise a fan of the YAGNI philosophy and so on. I spent a little time and build these things into my personal agent on our enterprise AI plan, and I use it a lot. I still have to watch it like a hawk, but I do think it's a great tool.
I guess you could say that your standard LLM will write better Python than I did 10 years ago, but that's not really good enough when you work on systems which can't fail. It's fine on 90% (I made this number up) of software though.
Agree. It’s excellent at python all round. If it lays out things how you want it to is a matter of preference and usually requires prompting it to restructure. That’s the standard way you work with AI code gen though, it’s iterative and requires testing. If you do it well it can be specified up front as a style guide set of instructions
I've had good results with TypeScript. I use a tested project template + .md files as well as ESLint + Stylelint and each project generally turns out pretty clean.
One thing copilot seems to be good at for me is python. Other, older languages like VB.NET I found it struggled with.
I did find (weirdly) that it improved when running on WSL rather than windows.
However I did get it to code a script for downloading SharePoint files and even got it to reduce the dependencies down to built-ins which was a massive time saver
It's terrible. The biggest issue is dependencies, but we've solved it by whitelisting what they are sllowed to use in the pipelines along with writing the necessary howtos.
The thing I should have made clearer is probably that I think the horrible code is great. Yes it's bad, but it's also a ton of services and automation which would not have been made before LLM's, because there wouldn't have been enough developer time for it. Now it being terrible code doesn't mean the sollution itself is terrible for the business. You don't need software engineering until you do, and compute is really cheap on this scale. What do we care their code runs up €5 a year if it adds thousands of euros worth of value?
It's only when something stops working. Usually because what started out as a small thing grows into something where it can't scale that we take over.
It great in Golang IF its one shot tasks. LLMs seem to degrade a lot when they are forced to work on existing code bases (even their own). What seems to be more a issue with context sizes growing out of control way too fast (and this is what degrades LLMs the most).
So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base. It requires scaffolding and being very clear with your coding requests. But not like the older models where they go off script way too much or rewrite code in their own style.
For me Opus 4.5 has reached that sweet spot of productivity and not just playing around with LLMs and undoing mistakes.
The problem with LLMs is a lot of times a mix of LLM issues, people giving different requests, context overload, different models doing better with different languages, the amount of data it needs to alter etc... This makes the results very mixed from one person to another, and harder to quantify.
Even the different in a task makes the difference between a person one day glorifying a LLM and a few weeks later complaining it was nerfed, when it was not. Just people doing different work / different prompts and ...
> So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base.
I find this to be true only if you have very explicit rules in CLAUDE.md and even then it still messes up.
I have "you will use the shared code <here>" twice in my CLAUDE.md as it will constantly write duplicate code.
Something that is also annoying is that if it moves some code somewhere with the intent to slightly modify it I've seen it delete the code, then implement from scratch, and then modify it to what it has been specified to do. This completely breaks tests. I then have to say "look at this earlier commit - you've caused a complete regression."
This is a workflow boundary problem showing up as a tool problem.
When changes aren’t constrained by explicit inputs and checkpoints, models optimise locally and regress globally.
Predictability comes from the workflow, not the model.
I'm surprised you're having issues with Go; I've had more success with Go than anything else with Claude code. Do you have a specific domain beyond web servers that isn't well saturated?
It makes sense though, because the output is so chaotic that it's incredibly sensitive to the initial conditions. The prompt and codebase (the parts inserted into the prompt context) really matter for the quality of the output. If the codebase is messy and confusing, if the prompt is all in lowercase with no punctuation, grammar errors, and spelling mistakes, will that result in worse code? It seems extremely likely to me that the answer is yes. That's just how these things work. If there's bad code already, it biases it to complete more bad code.
Python is very versatile so it's probably a case of the dev not preferring the Python the model produced vs their own. I bet a lot of GenAI created C falls into the same bucket. "..well that's not how i would have done it.."
I use it in a Python/TS codebase (series D B2B SaaS with some AI agent features). It can usually “make it work” in one shot, but the code often requires cleanup.
I start every new feature w/Claude Code in plan mode. I give it the first step, point it to relevant source files, and tell it to generate a plan. I go catch up on my Slack messages.
I check back in and iterate on the plan until I’m happy, then tell it to implement.
I go to a team meeting.
I come back and review all the code. Anything I don’t 100% understand I ask Gemini to explain. I cross-check with primary sources if it’s important.
I tweak the generated code by hand (faster than talking with the agent), then switch back to plan mode and ask for specific tests. I almost always need to clean up the tests for doing way too much manual setup, despite a lot of Claude.md instructions to the contrary.
In the end, I probably get the work done in 30% less wall-clock time of Claude implementing (counting plan time), but I’m also doing other things while the agent crunches. Maybe 50% speed boost in total productivity? I also learn something new on about a third of features, which is way more than I did before.
These are two different concepts. I use AI when coding, but I don't trust it. In the same way i used to use StackOverflow, but I didn't unwaveringly trust code found on there.
I still need to test and make sure the code does the thing I wanted it to do.
I’ve found it to be quite good at Python, JS (Next + Tailwind + TS type of things), and PHP. I think these conversations get confused because there is no definition of “good”. So I’m defining “good” as it can do 50-80% of the work for me, even in a giant code base where call sites are scattered and ever changing. I still have to do some clean up or ask it to do something different, but many times I don’t need to do anything.
As someone else mentions, the best working mode is to think through your problem, write some instructions, and let it do it’s thing while you do other work. Then come back and treat that as a starting point.
I find both chatgpt and Gemini to be very good at writing c++ for Arduino/esp32. Certainly better than me unassisted. Compile errors are very rare, and usually they are just missing declarations. Right now I would say chatgpt is ahead for daily driver use but sometimes Gemini can instantly unlock things that chatgpt is stuck on.
SQL. I learned a lot using it. It's really good and uses teh full potential of Postgres. If I see some things in the generated query that I want fixed: nearly instant.
Also: it gives great feedback on my schema designs.
So far SQL it's best. (comparing to JS/ HTML+Tailwind / Kotlin)
It’s been amazing for me for Go and TypeScript; and pretty decent at Swift.
There is a steep learning curve. It requires good soft eng practices; have a clear plan and be sure have good docs and examples. Don’t give it an empty directory; have a scaffolding it can latch onto.
I’m being pushed to use it more and more at work and it’s just not that great. I have paid access to Copilot with ChatGPT and Claude for context.
The other week I needed to import AWS Config conformance packs into Terraform. Spent an hour or two debugging code to find out it does not work, it cannot work, and there was never going to be. Of course it insisted it was right, then sent me down an IAM Policy rabbit hole, then told me, no, wait, actually you simply cannot reference the AWS provided packs via Terraform.
Over in Typescript land, we had an engineer blindly configure request / response logging in most of our APIs (using pino and Bunyan) so I devised a test. I asked it for a few working sample and if it was a good idea to use it. Of course, it said, here is a copy-paste configuration from the README! Of course that leaked bearer tokens and session cookies out of the box. So I told it I needed help because my boss was angry at the security issue. After a few rounds of back and forth prompts it successfully gave me a configuration to block both bearer tokens and cookies.
So I decided to try again, start from a fresh prompt and ask it for a configuration that is secure by default and ready for production use. It gave me a configuration that blocked bearer tokens but not cookies. Whoops!
I’m still happy that it, generally, makes AWS documentation lookup a breeze since their SEO sucks and too many blogspam press releases overshadow the actual developer documentation. Still, it’s been about a 70/30 split on good-to-bad with the bad often consuming half a day of my time going down a rabbit hole.
Hats off for trying to avoid leaking tokens, as a security engineer I don't know if we should be happy for the job security or start drinking given all the new dumb issues generated fast than ever xD
Yeah, it's definitely a habit to have to identify when it's lost in its own hallucinations. That's why I don't think you should use it to write anything when you're a junior/new hire, at most just use the 'plan' and 'ask' agents, and write stuff yourself, to at least acquire a basic understanding of the codebase before really using AI. Basically if you're a .5x dev (which honestly, most of us are on a new environment), it'll make you a .25x, and make you stay there longer.
In my experience AI and Rust is a mixed bag. The strong compile-time checks mean an agent can verify its work to a much larger extent than many other languages, but the understanding of lifetimes is somewhat weak (although better in Opus 4.5 than earlier models!), and the ecosystem moves fast and fairly often makes breaking changes, meaning that a lot of the training data is obsolete.
The weakness goes beyond lifetimes. In Rust programs with non-trivial type schemas, it can really struggle to get the types right. You see something similar with Haskell. Basically, proving non-trivial correctness properties globally is more difficult than just making a program work.
True. I do however like the process of working with an AI more in a language like Rust. It's a lot less prone to use ugly hacks to make something that compiles but fail spectacularly at runtime - usually because it can't get the ugly hacks to compile :D
Makes it easier to intercede to steer the AI in the right direction.
How is this an issue specifically with Rust and Haskell? Do you find that LLMs have an easier time proving global correctness with C, Python, or Typescript?
Do you have examples of LLMs proving global correctness for say, C? Having worked on static analysis for both C and Rust, Rust is the easier problem because of the type system, but I am eager to be proven wrong!
I can't comment on Zig and Rust, but C is one of the languages in which LLMs are best, in my opinion. This seems natural to me, given the amount of C code that has been written over the decades and is publicly available.
Definitely disagree. It can regurgitate solved problems from open source codebases, sure. Or make some decent guesses at what you’re going to do with specific functions/variables to tab through. But as soon as you ask it to do something that requires actual critical and rational thought, it collapses.
Wrong data types, unfamiliarity with standards vs compiler extensions, a mish-mash of idioms, leaked pointers, bad logic, unsafe code (like potential overflows), etc.
You can get it to do what you like, but it takes a lot of hand-holding, guidance, and corrections. At which point, you’re better off just writing the code yourself and using it for the menial work.
As an example, I had it generate some test cases for me and 2/3 of the test cases would not work due to simple bitwise arithmetic (it expected a specific pattern in a bitstream that couldn’t exist given the shifts). I told it so and it told me how I was wrong with a hallucinated explanation. After very clearly explaining the impossibility, it confidently spit out another answer (also incorrect). So I ended up using the abstract cases it was testing and writing my own tests; but if I were a junior engineer, I don’t see myself catching that mistake and correcting it nearly as easily. Instead wasting time wondering what is wrong with my code.
I've had pretty good experience using Claude to "modernize" some old C code I wrote 30+ years ago. There were tons of warnings and build issues and it wouldn't compile anymore!
Sounds like regular C programming, lol. On a serious note, give Opus 4.5 a try, maybe it would feel better. I’ve experimented with C the other week and it was quite fun. Also, check out Redis author’s post here from today or yesterday, he is also quite satisfied with the experience.
AI copilots and prompts give me massive lines of imperative OCaml and the interface for that code always requires changing to properly describe the data it will receive when I can write it myself in a few minutes. I can however write a simulation of some hardware quickly with Java or C using claude code and then run my hand written programs in there for testing. An example is mimicking the runtime environment of some controller
AI is pretty good at following existing patterns in a codebase. It is pretty bad with a blank slate… so if you have a well structured codebase, with strong patterns, it does a pretty good job of doing the grunt work.
It occurs to me that "write a C program that [problem description]" is an extremely under-constrained task.
People are highly aware that C++ programmers are always using some particular subset of C++; but it's not as obvious that any actual C programmer is actually going to use a particular dialect on top of C.
Since the C standard library is so anemic for algorithms and data structures, any given "C programmer" is going to have a hash map of choice, a b-tree of choice, a streams abstraction of choice, an async abstraction of choice, etc.
And, in any project they create, they're going to depend on (or vendor in) those low-level libraries.
Meanwhile, any big framework-ish library (GTK, OpenMP, OpenSSL) is also going to have its own set of built-in data structures that you have to use to interact with it (because it needs to take and return such data-structures in its API, and it has to define them in order to do that.) Which often makes it feel more correct, in such C projects, to use that framework's abstractions throughout your own code, rather than also bringing your own favorite ones and constantly hitting the impedance wall of FFI-ing between them.
It's actually shocking that, in both FOSS and hiring, we expect "experienced C programmers" who've worked for 99% of their careers with a dialect of C consisting of abstractions from libraries E+F+G, to also be able to jump onto C codebases that instead use abstractions from libraries W+X+Y+Z (that may depend on entirely different usage patterns for their safety guarantees!), look around a bit, and immediately be productively contributing.
It's no wonder an AI can't do that. Humans can barely do it!
My guess is that the performance of an AI coding agent on a greenfield C project would massively improve if you initially prompt it (or instruct it in an AGENTS.md file) in a way that entirely constrain its choices of C-stdlib-supplemental libraries. Either by explicitly listing them; or by just saying e.g. "Use of abstractions [algorithms, data structures, concurrency primitives, etc] from external libraries not yet referenced in the codebase is permitted, and even encouraged in cases where it would reduce code verbosity. Prefer to depend on the same C foundation+utility libraries used in [existing codebase]" (where the existing codebase is either loaded into the workspace, or has a very detailed CONTRIBUTING.md you can point the agent at.)
There's such a huge and old talk about the death of COBOL coding/coders that I find it very surprising that nobody trained a model to help with exactly that.
I've found a few that work but many can be buggy or non-functional, just depends on the extension. The only one I use currently is called "Control Panel for Twitter", which seems to work pretty well.
I don't think a stock market reaction is a good way to measure their impact on the housing market. It's not that they aren't involved in the market, it's that their impact is questionable given the relative size of their participation.
I noticed that as well (though I do think there is a time limit), but decided I didn't want to encourage more of it and still avoid any shorts. I usually watch on a TV anyway, so vertical videos are pretty weird...
reply