Yes but you need to setup quite a bit of tooling to provide feedback loops.
It's one thing to get an llm to do something unattended for long durations, it's a other to give it the means of verification.
For example I'm busy upgrading a 500k LoC rails 1 codebase to rails 8 and built several DSLs that give it proper authorised sessions in a headless browser with basic html parsing tooling so it can "see" what affect it's fixes have. Then you somehow need to also give it a reliable way to keep track of the past and it's own learnings, which sound simple but I have yet to see any tool or model solve it on this scale...will give sonnet 4.5 a try this weekend, but yeah none of the models I tried are able to produce meaningful results over long periods on this upgrade task without good tooling and strong feedback loops
Btw I have upgraded the app and taking it to alpha testing now so it is possible
I've tried asking it to log every request and response to a project_log.md but it routinely ignores that.
I've also tried using playwright for testing in a headless browser and taking screenshots for a blog that can effectively act as a log , it just seems like too tall an order for it.
It sounds like you're streets ahead of where I am could you give me some pointers on getting started with a feed back loop please
So we have an LLM code scaffold repo we use in a large (2m loc) production Rails codebase and it works amazingly well.
Rails and especially Ruby lends itself to describing business logic as part of source code closer to natural language than a lot of typed languages imo and that synergizes really well with a lot of different models and neat LLM uses for code creation and maintenance.
Interesting! What sort of stuff goes in the scaffold repo? Like examples of common patterns?
Definitely agree I think Ruby's closeness to natural language is a big win, especially with the culture of naming methods in self-explanatory ways. Maybe even moreso than in most other languages. Swift and Objective C come to mind as maybe also being very good for LLMs, with their very long method names.
ETL pipelines, we catalogue and link our custom transformers to bodies of text that describes business cases for it with some examples, you can then describe your ETL problem in text and it will scaffold out a pipeline for you.
Fullstack scaffolds that go from models to UI screen, we have like a set of standard components and how they interact and communicate through GraphQL to our monolith (e.g. server side pagination, miller column grouping, sorting, filtering, PDF export, etc. etc.). So if you make a new model it will scaffold the CRUD fully for you all the way to the UI (it get's some stuff wrong but it's still a massive time save for us).
Patterns for various admin controls (we use active admin, so this thing will scaffold AA resources how we want).
Refactor recipes for certain things we've deprecated or improved. We generally don't migrate everything at once to a new pattern, instead we make "recipes" that describe the new pattern and point it to an example, then run it as we get to that module or lib for new work.
There are more, but these are some off the top of my head.
I think a really big aspect of this though is the integration of our scaffolds and recipes in Cursor. We keep these scaffold documents in markdown files that are loaded as cursor notepads which reference to real source code.
So we sort of rely heavily on the source code describing itself, the recipe or pattern or scaffold just provides a bit of extra context on different usage patterns and links the different pieces or example together.
You can think of it as giving an LLM "pro tips" around how things are done in each team and repo which allows for rapid scaffold creation. A lof of this you can do with code generators and good documentation, but we've found this usage of Cursor notepads for scaffolds and architecture is less labour intensive way to keep it up to date and to evolve a big code base in a consistent manner.
---
Edit: something to add, this isn't a crutch, we require our devs to fully understand these patterns. We use it as a tool for consistency, for rapid scaffold creation and of course for speeding up things we haven't gotten around to streamlining (like repetitive bloat)
There is the downside of having to maintain both schemas now.
Unless you automate it devs will have to remember to migrate both when making a change which adds some overhead, not a lot, but it's just something to consider here imo as some migrations (schema and/or data) can become nasty and complex
Other than a means to gain advice, I've found this question a good gauge for collaboration compatibility.
I often ask it in interviews on both sides to give me insight into what someone currently values.
It's sort of like asking someone to define "better" be that in skill, or happiness or avoidance of pain.
My main interpretation of the post would be a high value placed on pragmatism, gained via a journey of experimentation. Put crudely there is no silver bullet but try a few for a while
I think point 10 is highly underrated in my opinion
We've had success using DBT incremental models for this. Deletes aren't supported though, but for our use case that's fine and do a nighly refresh on the whole.
It's one thing to get an llm to do something unattended for long durations, it's a other to give it the means of verification.
For example I'm busy upgrading a 500k LoC rails 1 codebase to rails 8 and built several DSLs that give it proper authorised sessions in a headless browser with basic html parsing tooling so it can "see" what affect it's fixes have. Then you somehow need to also give it a reliable way to keep track of the past and it's own learnings, which sound simple but I have yet to see any tool or model solve it on this scale...will give sonnet 4.5 a try this weekend, but yeah none of the models I tried are able to produce meaningful results over long periods on this upgrade task without good tooling and strong feedback loops
Btw I have upgraded the app and taking it to alpha testing now so it is possible