Hacker Newsnew | past | comments | ask | show | jobs | submit | _pdp_'s commentslogin

So no static compiler checks and apparently no fuzzers used to ensure these rules work as intended?

Such tooling exists for Lua? Didn't know.

I don't have time to look into it right now (def later)!

However, I was curious to see if github copilot can reverse engineer it based on the latest commits and seems that what it is saying aligns with both advisories. It pointed out that it has to do with circular reference handling which sounds to me something that can be easily overlooked.

While this analysis might be completely off, the simple fact that I could get even this information without much efforts is mind-boggling. With better setup it might be able to get more.

With AI now being common place, coordinated timely disclosure is even more important considering the stakes. It is theoretically possible to get an exploit working within minutes. Considering that we see one of these major vulnerabilities annually (and it seems to me around the same time of the year) a bad actor can easily capitalise on the opportunities when presented.


While I agree with your conclusion

> While this analysis might be completely off, the simple fact that I could get even this information without much efforts is mind-boggling. With better setup it might be able to get more.

This can essentially be rephrased as "I don't know if what the LLM said is true or not but the fact it may or may not be correct is amazing!"


I don't know what the LLM said is true for sure but based on my experience in the field sounds plausible. The only way to know is to verify it.

Btw, LLMs are already used in vulnerability discovery and exploit development.


> The only way to know is to verify it.

Which you should've done before making such statements imo.


Checked. The answer is no (Claude Opus 4.5 with OpenCode). It wasn't even able to write a scanner to check for the vulnerability that worked. I gave it the diffs and various writeups, and the free access to the source and compiled index.js. It kept trying to cheat by editing the source to add a vulnerability and saying that it got an RCE

It's easier for a bad actor to get an exploit, than for an operator to test his own site if the upgrade succeded

An operator might not be able to upgrade at all!

Along the fixes, the advisories now need to contain detailed workarouds, firewall rules and other adhoc solutions to ensure they get quickly deployed.


I tend to agree. Cloudflare and Vercel were able to mitigate in the form of WAF rules, but it's not immediately clear what a user or vendor can do to implement mitigations themselves other than updating their dependencies (quickly!).

IMO the CVE announcement could have been better handled. This was a level 10. If other mitigations can are viable and you know about them, you have a responsibility to disclose them in order to best protect the safety of the billions of users of React applications.

I wonder how many applications are still vulnerable.


A guide for mitigation is way more useful so we can back port only the fix and test if the fix works.

It makes total sense to me.

I am not surprised at all. I can already see self improving behaviour in our own work which means that the next logic step is self improving!

I know how this sounds but it seems to me, at least from my own vantage point, that things are moving towards more autonomous and more useful agents.

To be honest, I am excited that we are right in the middle of all of this!


[flagged]


I prefer cross-polinating, as we're probably diametrical.

There is far much easier way to do this and one that is perfectly aligned with how these tools work.

It is called documenting your code!

Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.

Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.

What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

You don't have to "prompt it just the right way".

What you have to do is to use the same old good best practices.


For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.

Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`

> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.


> For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

On Reddit's LLM subreddits people are rediscovering the very basics of software project management as some massive insights daily or very least weekly.

Who would've guessed that proper planning, accessible and up to documentation and splitting tasks into manageable testable chunks produces good code? Amazing!

Then they write a massive blog post or even some MCP mostrosity for it and post it everywhere as a new discovery =)


I can totally understand where you are coming from with this comment. It does feel a bit frustrating that people are rediscovering things that were written in books 30/40/50 years ago.

However, I think this is awesome for the industry. People are rediscovering basic things, but if they didn't know about the existing literature this is a perfect opportunity to refer them to it. And if they were aware, but maybe not practicing it, this is a great time for the ideas to be reinforced.

A lot of people, myself included, never really understand which practices are important or not until we were forced to work on a system that was most definitely not written with any good practices in mind.

My current view of agentic coding is that it's forcing an entire generation of devs to learn software project management or drowning under the mountain of debt an LLM can produce. Previously it took much longer to feel the weight of bad decisions in a project but an LLM allows you to speed-run this process in a few weeks or months.


So how exactly does one "write what this file is supposed to do in a clear concise way" in a way that is quickly comprehensible to AI? The gist of the article is that when your audience changes from "human" to "AI" the manner in which you write documentation changes. The article is fairly high quality, and presents excellent evidence that simply "documenting your code" won't get you as far as the guidelines it provides.

Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.


Writing documentation for LLMs is strangely pleasing because you have very linear returns for every bit of effort you spend on improving its quality and the feedback loop is very tight. When writing for humans, especially internal documentation, I’ve found that these returns are quickly diminishing or even negative as it’s difficult to know if people even read it or if they didn’t understand it or if it was incomplete.

This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:

1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)

2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)

Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".

CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.

Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".


This CLAUDE.md dance feels like herding cats. Except we’re herding a really good autocorrect encyclopedic parrot. Sans intelligence

Relating / personifying LLM to an engineer doesn’t work out

Maybe the best though model currently is just “good way to automate trivial text modifications” and “encyclopedic ramblings”


unfair characterization.

think about how this thing is interacting with your codebase. it can read one file at a time. sections of files.

in this UX, is it ergonomic to go hunting for patterns and conventions? if u have to linearly process every single thing u look at every time you do something, how are you supposed to have “peripheral vision”? if you have amnesia, how do you continue to do good work in a codebase given you’re a skilled engineer?

it is different from you. that is OK. it doesn’t mean its stupid. it means it needs different accomodations to perform as well as you do. accomodations IRL exist for a reason, different people work differently and have different strengths and weaknesses. just like humans, you get the most out of them if you meet and work with them from where they’re at.


You put a warning where it is most likely to be seen by a human coder.

Besides, no amount of prompting will prevent this situation.

If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.

I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.


Documenting for AI exactly like you would document for a human is ignoring how these tools work

But they are right, claude routinely ignores stuff from CLAUDE.md, even with warning bells etc. You need a linter preventing things. Like drizzle sql` templates: it just loves them.

You can make affordances for agent abilities without deviating from what humans find to be good documentation. Use hyperlinks, organize information, document in layers, use examples, be concise. It's not either/or unless you're being lazy.

Sounds like we should call them tools, not AI!

Agentic AI is LLMs using tools in a loop to achieve a goal.

Needs a better term than "AI", I agree, but it's 99% marketing the tech will stay the same.


> no amount of prompting will prevent this situation.

Again, missing the point. If you don't prompt for it and you document it in a place where the tool won't look first, the tool simply won't do it. "No amount of promoting" couldn't be more wrong, it works for me and all my coworkers.

> If it is a concern then you put a linter or unit tests to prevent it altogether

Sure, and then it'll always do things it's own way, run the tests, and have to correct itself. Needlessly burning tokens. But if you want to pay for it to waste its time and yours, go for it.

> I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.

It's not about avoiding mistakes! It's about having it follow the norms of your codebase.

- My codebase at work is slowly transitioning from Mocha to Jest. I can't write a linter to ban new mocha tests, and it would be a pain to keep a list of legacy mocha test suites. The solution is to simply have a bullet point in the CLAUDE.md file that says "don't write new Mocha test suites, only write new test suites in Jest". A more robust solution isn't necessary and doesn't avoid mistakes, it avoids the extra step of telling the LLM to rewrite the tests.

- We have a bunch of terraform modules for convenience when defining new S3 buckets. No amount of documenting the modules will have Claude magically know they exist. You tell it that there are convenience modules and to consider using them.

- Our ORM has findOne that returns one record or null. We have a convenience function getOne that returns a record or throws a NotFoundError to return a 404 error. There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError. And the hassle of maybe catching some instances isn't necessary, because avoiding it is just one line in CLAUDE.md.

It's really not that hard.


> There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError

Yes there is? Though this is usually better served with a type checker, it’s still totally feasible with a linter too if that’s your bag

> because avoiding it is just one line in CLAUDE.md.

Except no, it isn’t, because these tools still ignore that line sometimes so I still have to check for it myself.


> Yes there is? Though this is usually better served with a type checker, it’s still totally feasible with a linter too if that’s your bag

It's not, because you would have to implement a full static analyzer that traces where the result of a `findOne` call is checked for `null` and then check that the condition always leads to a `NotFoundError`. At best you've got a linter that only works some of the time, at worst you've just made your linter terribly slow and buggy.

> these tools still ignore that line sometimes so I still have to check for it myself.

this is _literally_ the point of the article


> 1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)

README files are not a new concept, and have been used in software for like 5 decades now, whereas CLAUDE.md files were invented 12 months ago...


You can also use your README (and in my own private project, I do!). But for folks who don't want their README clogged up with lots of facts about the project, you have CLAUDE.md

1. Create a tool that can check if a query hits a prexisting index

In step 2 either force Claude to use it (hooks) or suggest it (CLAUDE.md)

3. Profit!

As for "where stuff is", for anything more complex I have a tree-style graph in CLAUDE.md that shows the rough categories of where stuff is. Like the handler for letterboxd is in cmd/handlerletterboxd/ and internal modules are in internal/

Now it doesn't need to go in blind but can narrow down searches when I tell it to "add director and writer to the letterboxd handler output".


Learned this the hard way. Asked Claude Code to run a database migration. It deleted my production database instead, then immediately apologised and started panicking trying to restore it.

Thankfully Azure keeps deleted SQL databases recoverable, so I got it back in under an hour. But yeah - no amount of CLAUDE.md instructions would have prevented that. It no longer gets prod credentials.


Well, no. You run pretty fast into context limit (or attention limit for long context models) And the model understand pretty well what code does without documentation.

Theres also a question of processes. How to format code what style of catching to use and how to run the tests, which human keep on the bacl of their head after reading it once or twice but need a constant reminder for llm whose knowledge lifespan is session limited


I’m pretty sure Claude would not work well in my code base if I hadn’t meticulously added docstrings, type hints, and module level documentation. Even if you’re stubbing out code for later implementation, it helps to go ahead and document it so that a code assistant will get a hint of what to do next.

I think you’re missing that CLAUDE.md is deterministically injected into the model’s context window

This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt

I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely


Skills are fundamentally different. The only similarity I can see is perhaps the fact that in both cases an LLM is used to find the right skillset.

I am not sure who is the intended customer for this service.

The prompt and the model go hand in hand. If you randomly select the model the likelihood of getting something consistent is basically zero.

Also model pricing don't very that much. I have never heard of spot-instance equivalent for inference although that will be cool. The demand for GPU is so high right now that I think most datacenters are at 100% utilisation.

Btw landing page does not bring much confidence this is serious. Might want to change it to communicate better and also to be attractive to "developers" I guess.


Depends on what you're doing. Something like "read this text and extract all the phone numbers" or "write a 3-point summary of this email" will perform about the same on all good models.

It's a tool not an SEO branded, shiny website. It's a utility. & model pricing varies considerably for now. This tool will be useless in another year or two.

The equivalent of "Spot Instance" is basically the OpenAI Batch API

> Also model pricing don't very that much.

I'm curious when AI pricing will couple with energy markets. Then the location of the datacentre will matter considerably


The vast majority of vibe-coded apps are subpar because they're not built by experienced developers who understand proper software engineering practices.

However...

while I risk sounding like the discussions in r/vibecoding, I am convinced that it is absolutely possible to create high-quality, entirely machine-generated applications. And yes it bothers me as well because I am also invested my craft.

What I found is that the key isn't in how you prompt the LLMs but in the comprehensive tooling and guardrails you build around the development process. This includes custom ESLint rules, specific type constraints, custom MCPs, and even tailored VSCode plugins. It works. Not fully autonomously, but effectively enough that you often only need to perform a final review before deployment.

I have no doubt in my mind that the majority of software will be written by machines but only after the right guardrails are set in place - not just better models, magic prompts and fingers crossed.


I've started researching modeling architectures as an intermediate step in TLA+, prolog or whatever works (didn't find anything reasonable yet) instead of going from plan.md directly to implementation (assuming this is the lowest level plan.md of course). Got a hunch LLMs being great at boilerplate and translation should make ensuring conformance feasible...

Can you share some tips? Right now I spend 20 minutes vibe coding 3000 lines of code and then 3 hours reviewing every single line.

Exactly, I found that most of the time, I spend significantly more time reviewing the code; most of the time, there is a lot of repeated code. Refactoring and cleaning the code also require a lot of time.

I found that the time I spend reviewing and refactoring is marginally less than the time it takes to write the code myself. For very repetitive tasks, there is a sweet spot.

The only case where vibe-coding really delivered on the promise of super high-speed development is when you completely gave up on the code quality or you are in a greenfield project where all the business logic hasn't been fully defined.


Work in smaller chunks. 3000 lines of code is horrible to review, regardless if it’s human- or computer-made. Structure the tasks in a way that will enable the agent will to verify and iterate by itself.

Reviewing 100% of generated code is undoubtedly good software engineering practice, but its not vibe coding, at least by my definition.

I vibe coded a tool this week and my process was an iterative process of the prompting an LLM agent to implement small bits of functionality, the me checking the resultant output by hand (the output of the program, not the outputted code). There were shockingly few problems with this process, but when they did arise, I fixed them through a combination of reviewing the relevant code myself, reviewing log files, and additional prompting. I dont think I actually wrote or fixed a single line of code by hand, and I definitely haven't read 100% of it.

Once the project was feature complete, I used more or less the same process to refactor a number of things to implement improvements, simplifications, and optimizations. Including a linter in my automated commit workflow also found a couple minor problems during refactoring such as unused imports that were trivial for the agent to fix.

Is the code perfect, bug free, and able to handle every imaginable edge case without issue? Almost certainly not, but it works well enough for our use already and is providing real labor savings. But, its not documented very well, nor are any tests written yet. It might or might not be long term maintainable in its current state, but I certainly wouldn't be comfortable trying to sell or support it (we are not a software company and I am not a software developer).

I should note that while I have been very impressed with my use of agentic coding tools, I am skeptical that they scale well above small projects. The tool we built this week is a bit over 2000 lines of code. I am not nearly skilled enough to work on a large codebase but I suspect this vibe coding style of programming would not work well for larger projects.


That is a lot of code! Maybe that is the problem.

Normally I review 3-5 files in a single change. Tests are done separately to reduce the impact of writing tests that fit whatever it was written and there are a few dozen custom eslint rules (like eslint plugins) to enforce certain types of coding practices that make it harder for the LLM to generate code that I would other reject.

It is not that difficult really.


That's still a great trade in time if you end up keeping most of those 3000 lines.

depends on what you're trying to make. i suggest trying to vibe code a tool like i have called llm.exe which takes contents of predefined md file and sends it for response which is added to the end of the file. then incrementally add new flags to the tool to have more features and use other models, anything from generating audio using audio models to archiving and image input. Then try to create something in framework you are not familiar with and come up with your own methods allowing you to go much further than one shot. i tried to vibe code winapi and it's hard, but i think doable even for large scoped projects, the problem is context hoisting and you need to keep track of a spec. try to think what is the minimum text you need to describe what you are doing. ask models to generate one file or method at a time. i don't use fancy ide.

This is what I've never understood about vibe coding. Every attempt I've made (and quite a lot) makes me feel faster but in reality is slower.

When traditional coding >50% of the debugging is done while writing lines. If all you are measuring is lines out then you're discounting more than half the work.

I already have a strategy to optimize this. I work mostly in Python but it translates even when I work in compiled languages.

I write code 3 times.

Step 1: hack in an ipython terminal where I can load the program modules. This is actually where AI can help the most imo. Doesn't matter how messy, your first iteration is always garbage.

Step 2: translate that into a function in the codebase. Use Unix philosophy, keeping functions minimal. This helps you move fast now but more importantly you move fast later. Do minimal cleanup so everything is coherent. Any values should be translated into variables that the function can take as arguments. No hard coding! I guarantee even though you don't know it now you'll want to turn those knobs later. This is your MVP.

Step 3: this is a deceptively critical step! Write quick documentation. Explain your function signature. What is every variable's purpose and expected type. Ditto for the output. Add a short explanation of what the function does. Then you write developer comments (I typically do this outside the docstring). What are the limits? When does it work? When does it fail? What needs to be improved? What are the bottlenecks? Could some abstraction help? Is it too abstract? What tests do you need? What do they show or don't show? Big or small, add it. You add it now because you won't remember any of this after lunch, let alone tomorrow or in a year. This might sound time consuming but if your functions are simple then this entire step takes no more than 5 minutes. If you're taking more than 30 then your function is probably too complex. Either fix that now (goto step 1) or add that to your developer notes and move on.

Step 4: triage and address your notes. If there's low hanging fruit, get it now. A small issue now is a big issue tomorrow, so get it when it's small. If there's critical issues, address now. No code is perfect and you can't nor shouldn't address every issue. You triage! But because you have the notes if they become bigger issues or things change (they always do!) then you or someone else can much more easily jump in and address them.

This sounds complicated but it moves surprisingly fast when you get the habit. Steps 2-4 is where all the debugging happens. Step 2 gives you time to breath and sets you up to be able to think clearly. Step 3 makes you think and sets you up for success in the future even you'll inevitably come back. (Ironically the most common use I see is agents is either creating this documentation or being a substitution for it. But the best docs come from those who wrote the code and understand the intent, not just what it does). Step 4 is the execution, squashing those bugs. The steps aren't always clear cut and procedural, but more like a guide of what you need to do.

And no, I've never been able to get the to AI to do this end to end. I have found it helpful in parts but I find it best to have it run parallel to me, not in the foreground. It might catch mistakes I didn't see but it also tends to create new ones. Importantly it almost always misses big picture things. I agree with others, work in smaller chunks, but that's good advice when working with agents or not. Code is abstraction and abstraction isn't easy. Code isn't self documenting, no matter how long or descriptive your variable names are. I can't believe documentation is even a contentious subject considering how much time we waste on analyzing to just figure out what it even does (let alone if it does it well).


It you're conscious of the guardrails you're putting in, are you really "vibe coding"? The whole idea behind "vibe" coding is that you don't pay close attention to what the LLM gives you, you just go along with the "vibe".

I've built a couple systems where most of the code was generated by LLM prompts, but I wouldn't describe what I did as "vibe coding". Instead I was reviewing judiciously and applying my software engineering experience to ensure that the codebase was evolvable and maintainable. Not everything required careful review because certain subsystems were low risk and tolerant of sloppiness, but knowing where to watch like a hawk or hand code something was part of what I did.

I question the extent to which effective guardrails can be put in place as a general case. My impression matches what I've seen from others, which is that LLMs make senior devs who know how to supervise them more powerful — but I'm less certain about junior devs.


Anecdotally, I use a lot more AI then ever before - at least 5x more - hard to measure.

And to support the claim in the heading you are also invested in AI through your ChatBotKit project, so…

I am sure everyone is invested in their own project - I just happen to be invested in a project that is somewhat connected to this topic.

But yes I do use a lot more AI then I used to 6 months ago - some of them internally built - many others are sourced externally. I bet I will be using even more AI going forward.

I think it is inevitable!


Based on what he is building, it feels like _pdp_ is actually passionate about AI himself, and then ChatBotKit is a by-product of this passion. So pretty sure he'd use AI as much and root for it, even if not involved with that specific project.

A reference architecture for an AI agent that dynamically selects and utilizes skillsets based on user intent.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: