Hacker News new | past | comments | ask | show | jobs | submit login

The late author of ZeroMQ, Pieter Hintjens, advocated for a practice called Optimistic Merging[1], where contributions would be merged immediately, without reviewing the code or waiting for CI results. So your approach of having lax merging guidelines is not far off.

While I can see the merits this has in building a community of contributors who are happy to work on a project, I always felt that it opens the project to grow without a clear vision or direction, and ultimately places too much burden on maintainers to fix contributions of others in order to bring them up to some common standard (which I surely expect any project to have, otherwise the mishmash of styles and testing practices would make working on the project decidedly not fun). It also delays the actual code review, which Pieter claimed does happen, to some unknown point in the future, when it may or may not be exhaustive, and when it's not clear who is actually responsible of conducting it or fixing any issues. It all sounds like a recipe for chaos where there is no control over what eventually gets shipped to users. But then again, I never worked on ZeroMQ or another project that adopted these practices, so perhaps you or someone else here can comment on what the experience is like.

And then there's this issue of malicious code being shipped. This is actually brought up by a comment on that blog post[2], and Pieter describes exactly what happened in the xz case:

> Let's assume Mallory is patient and deceitful and acts like a valid contributor long enough to get control over a project, and then slowly builds in his/her backdoors. Then careful code review won't help you. Mallory simply has to gain enough trust to become a maintainer, which is a matter of how, not if.

And concludes that "the best defense [...] is size and diversity of the community".

Where I think he's wrong is that a careful code review _can_ indeed reduce the chances of this happening. If all contributions are reviewed thoroughly, regardless if they're authored by a trusted or external contributor, then strange behavior and commits that claim to do one thing but actually do something else, are more likely to be spotted earlier than later. While OM might lead to a greater community size and diversity, which I think is debatable considering how many projects exist with a thriving community of contributors while also having strict contribution guidelines, it doesn't address how or when a malicious patch would be caught. If nobody is in charge of reviewing code, there are no testing standards, and maintainers have additional work keeping some type of control over the project's direction, how does this actually protect against this situation?

The problem with xz wasn't a small community; it was *no* community. A single malicious actor got control of the project, and there was little oversight from anyone else. The project's contribution guidelines weren't a factor in its community size, and this would've happened whether it used OM or not.

[1]: http://hintjens.com/blog:106

[2]: http://hintjens.com/blog:106/comments/show#post-2409627




> The problem with xz wasn't a small community; it was no community. A single malicious actor got control of the project, and there was little oversight from anyone else.

So because of this a lot of other highly used software was importing and depending on unreviewed code. It's scary to think how common this is. The attack surface seems unmanageable. There need to be tighter policies around what dependencies are included, ensuring that they meet some kind of standard.


> There need to be tighter policies around what dependencies are included, ensuring that they meet some kind of standard.

This is why it's a good practice to minimize the amount of dependencies, and add dependencies only when absolutely required. Taking this a step further, doing a cursory review of each dependency, seeing the transitive dependencies it introduces, are also beneficial. Of course, it's impractical to do this for the entire dependency tree, and at some point we have to trust that the projects we depend on follow this same methodology, but having a lax attitude about dependency management is part of the problem that caused the xz situation.

One thing that I think would improve this are "maintenance scores". A service that would scan projects on GitHub and elsewhere, and assign a score to each project that indicates how well maintained it is. It would take into account the number of contributors in the past N months, development activity, community size and interaction, etc. Projects could showcase this in a badge in their READMEs, and it could be integrated in package managers and IDEs that could warn users if adding a dependency that has a low maintenance score. Hopefully this would disuade people to use poorly maintained projects, and encourage them to use better maintained ones, or avoid the dependency altogether. It would also encourage maintainers to improve their score, and there would be higher visibility of projects that are struggling, but have a high user base, as potentially more vulnerable to this type of attack. And then we can work towards figuring out how to provide the help and resources they need to improve.

Does such a service/concept exist already? I think GitHub should introduce something like this, since they have all the data to power it.


That’s not an effective idea for the same reason that lines of code is not a good measure of productivity. It’s an easy measure to automate but it’s purely performative as it doesn’t score the qualitative value of any of the maintenance work. At best it encourages you to use only popular projects which is its own danger (software monoculture is cheaper to attack) without actually resolving the danger - this attack is reasonably sophisticated and underhanded that could be slipped through almost any code review.

One real issue is that xz’s build system is so complicated that it’s possible to slip things in which is an indication that the traditional autoconf Linux build mechanism needs to be retired and banned from distros.

But even that’s not enough because an attack only needs to succeed once. The advice to minimize your dependencies is an impractical one in a lot of cases (clearly) and not in your full control as you may acquire a surprising dependency due to transitiveness. And updating your dependencies is a best practice which in this case actually introduces the problem.

We need to focus on real ways to improve the supply chain. eg having repeatable idempotent builds with signed chain of trusts that are backed by real identities that can be prosecuted and burned. For example, it would be pretty effective counter incentive for talent if we could permanently ban this person from ever working on lots of projects. That’s typically how humans deal with members of a community who misbehave and we don’t have a good digital equivalent for software development. Of course that’s also dangerous as blackball environments tend to become weaponized.


> We need to focus on real ways to improve the supply chain. eg having repeatable idempotent builds with signed chain of trusts that are backed by real identities that can be prosecuted and burned.

So, either no open source development because nobody will vouch to that degree for others, or absolutely no anonymity and you'll have to worry about anything you provide because of you screw up and introduce a RCE all of a sudden you'll have a bunch of people and companies looking to say it was on purpose so they don't have to own up to any of their own poor practices that allowed it to actually be executed on?


You don’t need vouching for anyone. mDL is going to be a mechanism to have a government authority vouch your identity. Of course a state actor like this can forge the identity, but that forgery at least will give a starting point for the investigation to try to figure out who this individual is. There’s other technical questions about how you verify that the identity really is tied in some real way to the user at the other end (eg not a stolen identity) but there are things coming down that will help with that (ie authenticated chains of trust for hw that can attest the identity was signed on the given key in person and you require that attestation).

As for people accusing you of an intentional RCE, that may be a hypothetical scenario but I doubt it’s very real. Most people have a very long history of good contributions and therefore have built up a reputation that would be compared against the reality on the ground. No one is accusing Lasse Collin of participating in this even though arguably it could have been him all along for what anyone knows.

It doesn’t need to be perfect but directionally it probably helps more than it hurts.

All that being said, this clearly seems like a state actor which changes the calculus for any attempts like this since the funding and power is completely different than what most people have access to and likely we don’t have any really good countermeasures here beyond making it harder for obfuscated code to make it into repositories.


Your idea sounds nice in theory, but it's absolutely not worth the amount of effort. To put it in perspective, think about xz case, and how the amount of contributions would have prevented the release artifact (tar file) from being modified? Because other people would have used the tar file? Why? The only ones that use tarfiles are the ones that would be redistributing the code, they will not audit it. The ones that could audit it would look at the version system repository, not at the tar files. In other words, your solution wouldn't even be effective at potentially discovering this issue.

The only thing that would effectively do this, is that people stop trusting build artifacts and instead use direct from public repositories packaging. You could figure out if someone maliciously modified the release artifact by comparing it against the tagged version, but at that point, why not just shallow clone the entire thing and be done.


Even if you mandated two code reviewers per merge, the attacker can just have three fake personas backed by the same single human and use them to author and approve malware.


Also, in a more optimistic scenario without sockpuppets, it's unlikely that malicious and underhanded contributions will be caught by anyone that isn't a security researcher.


As always, there's a relevant XKCD: #2347.[0] It's frightening how much of modern infrastructure depends on vulnerable systems.

[0]: https://xkcd.com/2347/


Does that mean we don’t look into changes in dependencies at all when bumping them?


> then strange behavior and commits that claim to do one thing but actually do something else, are more likely to be spotted earlier than later.

https://en.wikipedia.org/wiki/Underhanded_C_Contest

It's actually an art to writing code like that, but it's not impossible and will dodge cursory inspection. And it's possible to have plausible deniability in the way it is constructed.


I'm not sure why my point is not getting across...

I'm not saying that these manual and automated checks make a project impervious to malicious actors. Successful attacks are always a possibility even in the strictest of environments.

What they do provide is a _chance reduction_ of these attacks being successful.

Just like following all the best security practices doesn't produce 100% secure software, neither does following best development practices prevent malicious code from being merged in. But this doesn't mean that it's OK to ignore these practices altogether, as they do have tangible benefits. I argue that projects that have them are better prepared against this type of attack than those that do not.


I wonder how good automated LLM-based code reviews would be at picking up suspicious patterns in pull requests.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: