Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a 40yo software engineer (at Microsoft) with no specific domain expertise other than using genAI for fun and some code completion, this essay/blog post articulates my gut feelings about where we are at, very well.


I am a math PhD student and I already draw some value from recent reasoning models. I strongky believe than in 1-2 years LLMs will become established tool for scientists to help with coding and math. I really don't think you can call this a con...


41 here, working in healthtech… and Devin has committed more code and closed more tickets on my behalf in the past week at my behest than I’ve done on my own in a month.

It’s basically functioning as a team of entry-level junior engineers at this point.

Previously I was having to spend a fair amount of time writing tickets and providing context, but lately I’ve fed all my meeting transcripts and such into an LLM and it interactively creates Jira tickets for me. Each one takes me maybe 30s to read before I confirm them and the assistant creates the actual tickets.


What kind of tickets are these? Even at non-complex tasks, I find agents struggle a lot.

Can you give some examples?


Sure. One task I gave it a couple of days ago was to upgrade the version of Python used in a project. In this case, that was a task suited for a junior engineer - it was simple enough to be described fully, but complex enough to require effort.

Devin was able to recognize that the project used Poetry, was Dockerized, and that the Python version was specific in multiple places (.python-version, pyproject.toml, Dockerfile). It saw that a couple of minor dependencies didn’t support the new version of Python, so it went back and upgraded those to the most recent matching version first.

Devin had never touched the repository in question before getting this task.

I’ve given it more and less complex tasks, and yeah, it struggles with some things. I’d estimate that it consumes about 5-10% of my time but multiples my overall output by ~3x.


I would be very curious about the size and complexity of this codebase. Every review of Devin I’ve seen has been very negative (burns a ton of money, gets stuck, doesn’t implement the changes you want).

For large codebases (greater than 15k or 20k LOC) the context size seems like a real problem right now.


I’ve used it for everything from “change this text on a webpage” to squashing complex migrations in multiple apps in a Django monolith where migrations in one app depends on migrations in other apps.

My apologies if anyone finds this offensive, but I sorta see Devin as a fresh junior SWE hire. It doesn’t do well with tasks that require deep knowledge sometimes, but it has shallow or better knowledge of everything. I would describe it as working with a brand new SWE with an IQ of about 85 who is also on the low end of being high-functioning autistic. By that I mean that it takes most things literally and sometimes has difficulty with nuance.

> burns a ton of money, gets stuck, doesn’t implement the changes you want

The first time you use it, I think that’s pretty fair. Every time it gets stuck or does the wrong thing, when you correct it, it gives you the option to add to its “knowledge base”. That’s a bunch of additional context that it applies in only certain situations. Within a week or so of using it regularly, it’s significantly more valuable. It “learns” much faster than a human.

Example:

About a dozen of our projects all rely on a shared repository (“Enki”) that contains a Composefile, configs, and some light automation. Tests are run in Docker, and you have to navigate to the other repo’s directory to bring up the service. Some of those projects have service names in the Composefile that differ from the project name. I was able to run the steps interactively on “Devin’s machine”, tell Devin what I had done, and then tell it that this is the correct approach for any project that depends on that repository. I didn’t tell it what projects those are, or how to find out.

The next time I used Devin on a project like that, it tried to run the tests directly in a local Python environment. That didn’t work, but it tried the correct approach next. That worked, so it added a line to its knowledge base “Project <foo> uses Enki.” From that point forward it did the right thing the first time.

> For large codebases (greater than 15k or 20k LOC) the context size seems like a real problem right now.

The primary project I’m working on is a Django app. I don’t have it in front of me right now, but it’s about five years old, has been under very active development the entire time, and is comprised of about twenty apps. It’s not the largest codebase I’ve worked on, but it’s far from the smallest. I can do a line count tomorrow if you’d like.


This terrifies me.


It excites me. The only way it would really terrify me is if I were a very junior engineer right now or in college to be one.

I think we’ll see a ton of complaints about how bad the job market is in the next couple of years. That will be true, but only for juniors or for seniors who don’t embrace the tech. For seniors who do embrace it and specialize in implementing these systems, it’ll be a gold mine.

Then, over 5-10 years, our seniors will start to retire or leave the field. No one will be there to replace them. At that point we’ll see a resurgence in the job market.

Things like autocompletion and “chat with your codebase” help juniors more than seniors; agents help seniors much more than juniors. As these systems improve, their failure cases get more and more complex/nuanced - you will always need senior people with the insight necessary to figure out what’s wrong when it breaks. For a while that will help seniors and hurt juniors… right up until businesses realize that they don’t have replacements for their existing senior engineers, at which point they’ll be desperate to hire again.


You have massively misunderstood what I’m terrified by. In fact you’ve described something I find the least terrifying of anything I’ve ever read because it’s all pure fantasy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: