Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My perspective is that if you are unable to find ways to improve your own workflows, productivity, output quality, or any other meaningful metric using the current SOTA LLM models, you should consider the possibility that it is a personal failure at least as much as you consider the possibility that it is a failure of the models.

A more tangible pitfall I see people falling into is testing LLM code generation using something like ChatGPT and not considering more involved usage of LLMs via interfaces more suited for software development. The best results I've managed to realize on our codebase have not been with ChatGPT or IDEs like Cursor, but a series of processes that iterate over our full codebase multiple times to extract various levels of resuable insights, like general development patterns, error handling patterns, RBAC-related patterns, extracting example tasks for common types of tasks based on git commit histories (i.e. adding a new API endpoint related to XYZ), common bugs or failure patterns (again by looking through git commit histories), which create a sort of library of higher-level context and reusable concepts. Feeding this into o1, and having a pre-defined "call graph" of prompts to validate the output, fix identified issues, consider past errors in similar types of commits and past executions, etc has produced some very good results for us so far. I've also found much more success with ad-hoc questions after writing a small static analyzer to trace imports, variable references->declarations, etc, to isolate the portions of the codebase to use for context rather than RAG-based searching that a lot of LLM-centric development tools seem to use. It's also worth mentioning that performance quality seems to be very much influenced by language; I thankfully primarily work with Python codebases, though I've had success using it against (smaller) Rust codebases as well.



Sometimes if it's as much work to setup and keep the tech running compared to writing it, it can be worth thinking about the tradeoffs.

A person with experience knowing how to push LLMs to output the perfect little function or utility to solve a problem, and collect enough of them to get somewhere is the interesting piece.


This sounds nice, but it also sounds like a ton of work to set up we don't have time for. Local models that don't require us to send our codebase to Microsoft or OpenAI would be something I'm sure we'd be willing to try out.

I'd love it if more companies were actually considering real engineering needs to provide products in this space. Until then, I have yet to see any compelling evidence that the current chatbot models can consistently produce anything useful for my particular work other than the occasional SQL query.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: