I'm getting a lot of side-quest productivity out of AI. There's always a bunch o...

gspencley · 2025-08-05T17:04:53 1754413493

> making tests

What I'm about to discuss is about me, not you. I have no idea what kind of systems you build, what your codebase looks like, use case, business requirements etc. etc. etc. So it is possible writing tests is a great application for LLMs for you.

In my day to day work... I wish that developers where I work would stop using LLMs to write tests.

The most typical problem with LLM-generated tests on the codebase where I work is that the test code is almost extremely tightly coupled to the implementation code. Heavy use of test spies is a common anti-pattern. The result is a test suite that is testing implementation details, rather than "user-facing" behaviour (user could be a code-level consumer of the thing you are testing).

The problem with that type of a test is that is a fragile test. One of the key benefits of automated tests is that they give you a safety net to refactor implementation to your heart's content without fear of having broken something. If you change an implementation detail, and the "user-facing" behaviour does not change, your tests should pass. When tests are tightly coupled to implementation, they will fail and now your tests, in the worst of cases, might actually be creating negative value for you ... since you every code change now requires you to keep tests up to date even when what you actually care about testing "is this thing working correctly?" hasn't changed.

The root of this problem isn't even the LLM, it's just that the LLM makes it a million times worse. Developers often feel like writing tests are a menial chore that needs to be done after the fact to satisfy code coverage policy. Few developers, at many organizations, have ever truly worked TDD or learned testing best practices, how to write easy to test implementation code etc.

JoshuaDavid · 2025-08-05T19:32:54 1754422374

There are some patterns you can use that help a bit with this problem. Lowest hanging fruit is to tell the LLM that its tests should test only through public interfaces where possible. Next after that is to add a "check if any non-public interfaces were used in places where a public interface exposes the same functionality the not-yet-committed tests - if so, rewrite tests to use only publicly exposed interfaces" step to the workflow. You could likely also add linter rules, though sometimes you genuinely need to test something like error conditions that can't reasonably be tested only through public interfaces.

gspencley · 2025-08-05T22:10:41 1754431841

Oh don't get me wrong. I'm sure that an LLM can write a decent test that doesn't have the problems I described. The problem is that LLMs are making a preexisting problem much, MUCH worse.

That problem statement is:

- Not all tests add value

- Some tests can even create dis-value (ex: slow to run, thus increasing CI bills for the business without actually testing anything important)

- Few developers understand what good automated testing looks like

- Developers are incentivized to write tests just to satisfy code coverage metrics

- Therefore writing tests is a chore and an afterthought

- So they reach for an LLM because it solves what they perceive as a problem

- The tests run and pass, and they are completely oblivious to the anti-patterns just introduced and the problems those will create over time

- The LLMs are generating hundreds, if not thousands, of these problems

So yeah, the problem is 100% the developers who don't understand how to evaluate the output of a tool that they are using.

But unlike functional code, these tests are - in many cases - arguably creating disvalue for the business. At least the functional code is a) more likely to be reviewed and code quality problems addressed and b) even if not, it's still providing features for the end user and thus adding some value.

mccoyb · 2025-08-05T22:30:24 1754433024

Force the LLM to write property-based tests (depends on the language you use whether or not good libraries are available -- but if they are available 100% make use of them). Iterate with the LLM on the invariants.

Forcing the discussion of invariants, and property-based testing -- seems to improve on the issues you're mentioning (when using e.g. Opus 4), especially when combined with the "use the public API" or interface abstractions.

jbryu · 2025-08-05T16:09:27 1754410167

Side-quest productivity is a great way to put it... It does feel like AI effectively enables the opposite of "death by a thousand cuts" (life by a thousand bandaids?)

arrowsmith · 2025-08-05T19:00:23 1754420423

I like that "side quests" framing.

For much of what I build with AI, I'm not saving two weeks. I'm saving infinity weeks — if LLMs didn't exist I would have never built this tool in the first place.