Hacker Newsnew | past | comments | ask | show | jobs | submit | eeperson's commentslogin

Everybody produces bugs, but Claude is good a producing code that looks like it solves the problem but doesn't. Developers worth working with, grow out of this in a new project. Claude doesn't.

An example I have of this is when I asked Claude to copy a some functionality from a front-end application to a back-end application. It got all of the function signatures right but then hallucinated the contents of the functions. Part of this functionality included a look up map for some values. The new version had entirely hallucinated keys and values, but the values sounded correct if you didn't compare with the original. A human would have literally copied the original lookup map.


I asked claude to help me figure out some statistical calculation in Apple Numbers. It helpfully provided the results of the calculation. I ignored it and implemented it in the spreadsheet and got completely different (correct) results. Claude did help me figure out how to do it correctly though!

> Developers worth working with, grow out of this in a new project. Claude doesn't.

There is no way this is true. People make fewer bugs with time and guidance, but no human makes zero bugs. Also, bugs are not planned; it's always easy to in hindsight say "A human would have literally copied the original lookup map," but every bug has some sort of mistake that is made that is off the status quo. That's why it's a bug.


No, it's broadly true. Also, that's why we have code review and tests, so that it has to pass a couple of filters.

LLMs don't make mistakes like humans make mistakes.

If you're a SWE at my company, I can assume you have a baseline of skill and you tested the code yourself, so I'm trying to look for any edge cases or gaps or whatever that you might have missed. Do you have good enough tests to make both of us feel confident the code does what it appears to do?

With LLMs, I have to treat its code like it's a hostile adversary trying to sneak in subtle backdoors. I can't trust anything to be done honestly.


Sorry, perhaps I should have been clearer. They don't grow completely out of making bugs (although they do tend to make fewer over time), they grow out of making solutions that look right but don't actually solve the problem. This is because they understand the problem space better over time.

> If you're in a project where buggy behavior wasn't introduced so much as grew (e.g. the behavior evolved A -> B -> C -> D -> E over time and a bug is reported due to undesirable interactions between released/valuable features in A, C, and E), then bisecting to find "when did this start" won't tell you that much useful.

I actually think that is the most useful time to use bisect. Since this is a situation where the cause isn't immediately obvious, looking through code can make those issues harder to find.


I'm glad it works for you! I may not have described the situation super clearly: most bugs I triage are either very causally shallow (i.e. they line up exactly with a release or merge, or have an otherwise very well-known cause like "negative input in this form field causes ISE on submit"), or else they're causally well understood but not immediately solvable.

For example, take a made up messaging app. Let's call it ButtsApp. Three big ButtsApp releases releases happened in order that add the features: 1) "send messages"; 2) "oops/undo send"; and 3) "accounts can have multiple users operating on them simultaneously". All of these were deemed to be necessary features and released over successive months.

Most of the bugs that I've spent lots of time diagnosing in my career are of the interacting-known-features variety. In that example, it would be "user A logs in and sends a message, but user B logs in and can undo the sends of user A" or similar. I don't need bisect to tell me that the issue only became problematic when multi-user support was released, but that release isn't getting rolled back. The code triggering the bug is in the undo-send feature that was released months ago, and the offending/buggy action is from the original send-message feature.

Which commit is at fault? Some combination of "none of them" and "all of them". More importantly: is it useful to know commit specifics if we already know that the bug is caused by the interaction of a bunch of separately-released features? In many cases, the "ballistics" of where a bug was added to the codebase are less important.

Again, there are some projects where bisect is solid gold--projects where the bug triage/queue person is more of a traffic cop than a feature/area owner--but in a lot of other projects, bugs are usually some combination of trivially easy to root-cause and/or difficult to fix regardless of whether the causal commit is identified.


> I know a lot of people want to maintain the history of each PR, but you won't need it in your VCS.

I strongly disagree. Losing this discourages swarming on issues and makes bisect worse.

> You should always be able to roll back main to a real state. Having incremental commits between two working stages creates more confusion during incidents.

If you only use merge commits this shouldn't be any more difficult. You just need to make sure you specify that you want to use the first parent when doing reverts.


I've heard people say before that it is easier to reason about a linear history, but I can't a think of a situation where this would let me solve a problem easier. All I can think of is a lot of downsides. Can you give an example where it helps?


The major things from Scala that I find useful are:

- higher kinded types

- null in types

- for comprehensions

- macros

- opaque types

- implicits/type classes

- persistent immutable collections

- EDIT named & default params


This always seemed strange to me. If your team can't be trusted not to make spaghetti in a monolith, what stops them from making distributed spaghetti in microservices? In theory the extra work of making an API call would give you smaller bowls of spaghetti. However, once you add some abstraction to making these calls it seems like developers are empowered to make the same mess. Except now it is slower and harder to debug.


> This always seemed strange to me. If your team can't be trusted not to make spaghetti in a monolith, what stops them from making distributed spaghetti in microservices?

It's far harder to update multiple services to handle requests they should not handle, let alone update a deployment to allow those requests to happen.

Walls make great neighbors, just like multiple services make teams great at complying with an architecture constraint.


> Walls make great neighbors

I think you're trying to solve a communication problem with a technical solution, which is a recipe for trouble.

If multiple teams working on interdependent components can't communicate well enough to keep from stepping on each other's toes, imposing technical barriers probably isn't going to make things better. Especially once you inevitably realize that you put the walls in the wrong place and functionality has to move across borders, which is now a major pain because you've intentionally made it hard to change.


> If multiple teams working on interdependent components can't communicate well enough to keep from stepping on each other's toes, imposing technical barriers probably isn't going to make things better.

But it actually does, and there is a lot of data to prove it. When you have a big project and a bunch of teams, the first thing you build is boundaries / walls. Then you get to defining interfaces between interdependent services. And this frees them up to get hacking on their modules in parallel - without stepping on each others' toes. Communication would have definitely helped, but it is way easier for smaller teams to own and operate their services and try to get a big organization plough through a big mess.

That said, microservices are just one way to solve a problem, and not always the right way. But there is always a place where you would look at the problem, the organization that is tasked to solve it, and it would fit just right in.


been there, done that.


> It's far harder to update multiple services to handle requests they should not handle, let alone update a deployment to allow those requests to happen.

I'm not sure I follow this. Doesn't this just mean that it is harder to make changes? Why would it be harder to make bad changes and not harder to make good changes?

> Walls make great neighbors, just like multiple services make teams great at complying with an architecture constraint.

I'm not sure I follow this either. Why would multiple services make teams great at complying with an architecture constraint?


It doesn't have to be. If your devices support HDMI-CEC[0], then you can turn on 1 device and everything sets itself up. For example, I can turn on my PS4 and it automatically turns on the TV and sets the correct input.

[0] https://en.wikipedia.org/wiki/Consumer_Electronics_Control


My Samsung TV "smart" feature overlays for 1+ minute when using HDMI-CEC.


Couldn't a malfunctioning sync process undo a soft delete as well?


Null is more ambiguous than an explicit conflict. If there is literally no record, even of the delete action then there's no timestamp for last write wins.


Why do you feel that Scala has just as much historical baggage? Isn't the whole point of the Scala 2/3 split to remove historical baggage.


Although beware, those pull through sharpeners are notorious for doing a terrible job sharpening knives. They take off far more material than needed and tend to produce an edge that isn't very sharp. YMMV.


I’m aware.

The context here is someone who does little cooking and just wants a sharp knife three times a year.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: