There are many cases in which I already understand the code before it is written. In these cases AI writing the code is pure gain. I do not need to spend 30 minutes learning how to hold the bazel rule. I do not need to spend 30 minutes to write client boilerplate. List goes on. All broad claims about AI's effects on productivity have counterexamples. It is situational. I think most competent engineers quietly using AI understand this.
no, it isn't. unless the generated code is just a few lines long, and all you are doing is effectively autocompletion, you have to go through the generated code with a fine toothed comb to be sure it actually does what you think it should do and there are no typos. if you don't, you are fooling yourself.
kind of, except that when i review a code submission to my project i can eventually learn to trust the submitter, once i realize they write good code. a code review is to develop that trust. AI code should never earn that trust, and any code review should always be treated like it it is from a first time submitter that i have never met before. the risk is that does not happen, and that we believe AI code submissions will develop like those of a real human. they won't. we'll develop a false sense of security, a false sense of trust. instead we should always be on guard.
and as i wrote in my other comment, reviewing the code of a junior developer includes the satisfaction of helping that developer grow through my feedback. AI will never grow. there is no satisfaction in reviewing its code. instead it feels like a sisyphusian task, because the AI will make the same mistakes over and over again, and make mistakes a human would be very unlikely to make. unlike human code with AI code you have to expect the unexpected.
Broadly I agree with you. I think of it in terms of responsibility. Ultimately the commit has my name on it, so I am the responsible party. From that perspective, I do need to "understand" what I am checking in to be reasonably sure it meets my professional standards of quality.
The reason I put scare quotes on "understand" is that we need to acknowledge that there are degrees of understanding, and that different degrees are required in different scenarios. For example, when you call syscall(), how well do you understand what is happening? You understand what's in the manpage; you know that it triggers a switch to kernel space, performs some task, returns some result. Most of us have not read the assembly code, we have a general concept of what is going on but the real understanding pretty much ends at the function call. Yet we check that in because that level of understanding corresponds to the general engineering standard.
In some cases, with AI, you can be reasonably sure the result is correct without deeply understanding it and still meet the bar. The bazel rule example is a good one. I prompt, "take this openapi spec and add build rules to generate bindings from it. Follow existing repo conventions." From my years of engineering experience, I already know what the result should look like, roughly. I skim the generated diff to ensure it matches that expectation; skim the model output to see what it referenced as examples. At that point, what the model produced is probably similar to what I would have produced by spending 30 minutes grepping around, reading build rules, et cetera. For this particular task, the model has saved me that time. I don't need to understand it perfectly. Either the code builds or it doesn't.
For other things, my standard is much higher. For example, models don't save me much time on concurrent code because, in order to meet the quality bar, the level of understanding required is much higher. I do need to sit there, read it, re-read it, chew on the concurrency model, et cetera. Like I said, it's situational.
There are many, many other aspects to quantifying the effects of AI on productivity, code quality is just one aspect. It's very holistic and dependent on you, how you work, what domain you work in, the technologies you work with, the team you work on, so many factors.
The problem is, even if all that is true, it says very little about the distribution of AI-generated pull requests to GitHub projects. So far, from what I’ve seen, those are overwhelmingly not done by competent engineers, but by randos who just submit a massive pile of crap and expect you to hurry up and merge it already. It might be rational to auto-close all PRs on GitHub even if tons of engineers are quietly using AI to deliver value.
> There are many cases in which I already understand the code before it is written. In these cases AI writing the code is pure gain.
That's only true if the LLM understands the code in the same way you do - that is, it shares your expectations about architecture and structure. In my experience, once the architecture or design of an application diverges from the average path extracted from training data, performance seriously degrades.
You wind up with the LLM creating duplicate functions to do things that are already handled in code, or using different libraries than your code already does.
Unless you have made some exceptional advances in the LLM agents (if you have, send me the claude skill?), you cant predict it.
If it was predictable like a transpiler, you wouldn't have to read it. you can think of it as a pure gain but you are just not reading the code its outputting.
As an aside, unless you are playing games that need NT kernel anticheat or are using a store other than steam, odds are the overall experience and performance is better on linux at this point.
And even Mac is doing well with games, most of my library runs natively. Baldurs Gate 3 runs better on the newer Apple chips than my somewhat aging gaming PC.
Yeah it's just the kernel anti-cheat now which is keeping me on windows. I'm fully ready to swap to linux but unfortunately I do like to play games that need it.
> all javascript on this website is optional (light/dark theme, particles background, and image lightboxing) and resides outside of the document body. localstorage is used to persist light/dark theme and mono/sans font state while surfing, as well as handle an over-18 check.
When I was grading labs as a TA, the intent was communicated to me rather as "per university teaching guidelines we mustn't have too many students get the top grade but we also mustn't have too many students fail"
It also helps to avoid populist teachers that give everyone a A+++ to avoid students complains, and also idiots that give everyone a C because only God is A and only the teacher is B.
(We don't use that method here, we use other method to try to avoid both problems.)
Where is the incentive for test makers in academia to accommodate this outcome? It sounds nice but I don’t think jaded professors or overworked, inexperienced, and stressed TAs have a reason to do this. It sounds nice but it doesn’t actually seem connected.
I would be _very_ surprised if any 18Fers were a part of the "national design studio." It is, at the very least, Doge affiliated, and Doge is who killed 18F.
So on the surface it seems that we already had many of these types of orgs, but they killed them all and spun up their own renamed and rebadged versions.
That being said, this project does seem like a potential big win.
reply