Another reminder of how annoying it is for a package system to have unqualified package names.
Having to ask someone to gift a `purescript` package shouldn't even be a thing. It should've been `@shinn/purescript` and the compiler developers just create their own `@whatever/purescript`.
This is something Elm and many others got right. https://package.elm-lang.org/ It's just infinitely, obviously better.
You see all sorts of problems because of this, like people "giving packages away" when they quit. Or buying package names. Or coming up with annoying name hacks because the obvious, best name is simply taken. Or people thinking/guessing that `npm install mysql` is the correct/best/canonical package because it's the simplest name, and anyone who publishes a better library has to name it mysql2 or better-mysql, etc. These just shouldn't even be things.
Whenever a large number of skilled people do something for which an alternative is "infinitely, obviously better", there's a good chance that there is more going on than you know.
RubyGems used to be namespaced this way and moved away from it. They didn't do so lightly.
The problem is that ownership, and even names of owners change all the time. In the very very large majority of cases, this change of ownership is an implementation detail that doesn't need to impact package consumers. If you enshrine the owner's name in the package, it means any change of ownership is effectively a breaking change to the package. When you have very large transitive dependency graphs, the result is constant, pointless churn.
It seems like a reasonable fix to this is to prevent name-specific owning (changing it to a group instead) -- this has the benefit that package maintainers who plan to maintain their packages forever can keep the same names, and those that don't can essentially fork their project and stop fixing the older version (@ <maintainer>/<project>) and force all changes to go to a new one (@ <group>/<project>) and hand off ownership as necessary.
This doesn't break old consumers and it allows for a pretty graceful migration path (if you want new updates, change your version) -- can eve be helped with marking package as deprecated or repo as archived and what not.
Assume Rubygems chose to operate in that way, where all gems must be owned by a group rather than an individual user.
Then, assume that the most flexible option is for each gem to be owned by a unique group: that way even if two gems are maintained by the same users right now, they use two distinct groups in case that ownership changes in the future.
We might as well just name the “group” the same as the gem name, since only one gem is managed by each group. So now the “purescript” group maintains “purescript”, the “pry” group maintains “pry”, etc.
As syntactic sugar for users, since the group name and gem name will always match, why make them type both? Let’s have all the commands support just referencing the gem name. If somebody wants to fork a gem and release it, their group and gem get a new name.
I think there’s a pretty compelling case that package managers should support group ACLing on publishing (giving multiple humans the first-class right to publish using individual creds to a group namespace, with the ability to add/remove users from the group over time. But once you’ve done that, the distinction between explicit group-name-in-package-path and changing-name-to-fork (so the difference between fork-group/orig-name and orig-name_fork-group) seems to shrink.
For what it’s worth, I think I actually like the idea of mandating a 2-part namespace, as the way to force “long” names (where “long” means “with enough context to make forking easier and more obvious”). I just wanted to call out that “force namespacing” isn’t a silver bullet for the issue.
A better feature-add might be supporting dependency replacements. For example, pre-modules, Go had an issue where if you had a dependency and I forked it, I had to go through all my code and replace references to your import path with mine. If I depended on something that depended on you, I was out of luck (or had to vendor and regex or a variety of other hacks). Now, with go modules, I can do a “replace” in my go.mod and sub in my fork for yours. That has enabled me to be much more flexible with my use of forked repositories, and most languages don’t have a direct parallel.
> Assume Rubygems chose to operate in that way, where all gems must be owned by a group rather than an individual user.
Sorry, this wasn't my premise -- I meant to have this as an option. As in <user>/<project> or <group>/project>.
> Then, assume that the most flexible option is for each gem to be owned by a unique group: that way even if two gems are maintained by the same users right now, they use two distinct groups in case that ownership changes in the future.
> We might as well just name the “group” the same as the gem name, since only one gem is managed by each group. So now the “purescript” group maintains “purescript”, the “pry” group maintains “pry”, etc.
I don't agree -- even if it's always group-a/purescript I think the distinction is still important, because in this case previous-group/puresecript still exists, but is frozen/archived. Here's how I'm understanding the scenario you laid out:
1. group-a/purescript is created
2. purescript changes ownership, group-b is going to be publishing it going forward
The fact that "group-b" is the "right" purescript is arbitrary/subjective to some degree.
> I think there’s a pretty compelling case that package managers should support group ACLing on publishing (giving multiple humans the first-class right to publish using individual creds to a group namespace, with the ability to add/remove users from the group over time. But once you’ve done that, the distinction between explicit group-name-in-package-path and changing-name-to-fork (so the difference between fork-group/orig-name and orig-name_fork-group) seems to shrink.
I think these two issues are a bit separate. Letting people dynamically change who owns/can publish to a repository is one way to solve this problem, but I think it's more complex than the fork-and-move approach.
IMO if some user wants to give up/transfer their repo, they:
1. find someone else to take over if they want
2. freeze/archive/whatever their repo
3. let the person fork & continue their work
An ownership change should be opt in, unless it was known @ package creation time that ownership would be a shared/rotated/changing/nebulous thing (which would be demonstrated by a group owning the package from the beginning).
I wasn’t implying you claimed that all gems needed a group, I was proposing it as part of the thought experiment. My apologies if that was unclear.
To your list of examples: my point parallels your own, I think. I’m saying that given the “right” version is arbitrary and subjective, the difference between “group-b/purescript” and “purescript-group-b” is effectively nil. More concretely: if namespacing existed, you could fork “group-a/purescript” to “group-b/purescript”, but if namespacing didn’t, you could fork “purescript” to “purescript-group-b”. In either case, dependent projects need to update where they source their dependencies from.
Namespacing, in my experience, tends to make the forking process slightly “cleaner”, because you avoid having a potentially non-“right” “original” (for example, “purescript” tends to look more legitimate than “purescript-group-b”). But some comments in this thread seem to paint namespacing as a hard requirement, or claim that package managers without namespacing are missing a core, mandatory feature. The case I’m presenting is that this isn’t the case: namespacing is a useful feature for several workflows, but adding namespacing doesn’t fundamentally alter the issue.
> I wasn’t implying you claimed that all gems needed a group, I was proposing it as part of the thought experiment. My apologies if that was unclear.
My apologies I certainly misread your comment.
> To your list of examples: my point parallels your own, I think. I’m saying that given the “right” version is arbitrary and subjective, the difference between “group-b/purescript” and “purescript-group-b” is effectively nil. More concretely: if namespacing existed, you could fork “group-a/purescript” to “group-b/purescript”, but if namespacing didn’t, you could fork “purescript” to “purescript-group-b”. In either case, dependent projects need to update where they source their dependencies from.
I agree -- the effects are definitely similar and almost equivalent. However does requiring a group/author change things at all? It seems like it could introduce an abstraction lever.
> Namespacing, in my experience, tends to make the forking process slightly “cleaner”, because you avoid having a potentially non-“right” “original” (for example, “purescript” tends to look more legitimate than “purescript-group-b”). But some comments in this thread seem to paint namespacing as a hard requirement, or claim that package managers without namespacing are missing a core, mandatory feature. The case I’m presenting is that this isn’t the case: namespacing is a useful feature for several workflows, but adding namespacing doesn’t fundamentally alter the issue.
I'm on the fence -- I'm not sure if this is a good counter case, but what about the layer of abstraction introduced by the implied/required existence of <group>? You could write code that imports "purescript", but then resolve it later (as some others mentioned, go.mod is or some other modules file that clarifies mappings) to determine which "purescript" that is. You could solve this by "alias"ing "project/purescript" to "purescript" (and then having some similar extra configuration that says "purescript" -> "project/purescript", and now I'm not sure if either is better (so basically having this indirection be a "module resolution feature" or a "module aliasing feature"), and if there's any value in forcing one (requiring the existence of <group> would almost certainly force the module resolution thing, but also break builds the second similarly named packages were published...) or if they really are just the same.
I also found the page on this by the rust team pretty convincing[0].
For clarity, I think the comment referencing “go.mod” that you’re describing, at least in this thread, is from me :D
I think I agree that the core feature that impacts this issue is what go.mod solves, and what you’re describing: it should be easy and language-supported to sub in one fork of a dependency for another fork, so that users can flip between “group-a”’s purescript and “group-b”’s purescript, regardless of how the namespacing works on the module registry (notably, golang dispenses with a registry entirely: there’s no central system, except insofar as github is used for lots of people’s packages).
OK, let's walk through how the scenario you lay out works in the context of a package graph.
I have my_app, which depends on foo and bar. Both of those use purescript. My app calls into foo which gets some object created from the purescript library and returns it. I then pass that object to bar. For this to work gracefully, they need to have a shared dependency on the same purescript.
(purescript is a weird example to use here, but imagine the shared dependency is a library that provides something like a reusable data structure.)
What happens to my_app when purescript gets passed from group-a to group-b? If foo wants to be on the latest, they need to move over to group-b. But if they do that and bar doesn't, then my_app can't get the latest version of foo. The foo and bar maintainers know that, which means they know they have a disincentive to move over to group-b. Better to stay on group-a and keep things moving smoothly with their existing users.
You end up in a situation where the choice that is better for a single package in isolation (move to the latest version of a dependency) is harmful to the package in context (it breaks shared dependencies and prevents users from upgrading to your latest version).
This is one of the most important situations to avoid when designing a package manager. As much as possible, you want to give package maintainers the freedom to evolve their package without it destabilizing the ecosystem. There's an argument that this is fundamentally what a package manager is — a tool to let you reuse changing code. If you don't need to evolve the code being reused, then FTP is a perfectly sufficient package manager.
Enshrining ownership in the package name directly confounds that. And, like another comment suggests, if you do that, maintainers will just route around it by creating an "organization" for each package, putting you right back where you started.
You could go farther and use a DNS name as a group name, then publish packages by signing with the SSL key. Anyone who doesn't want to shell out for a domain name could use a registry service that gives subdomains out. Why reinvent the governance wheel?
Their stance seems pretty reasonable though I'm not sure I would have done the same (and it's obviously very likely I would be wrong to do the opposite of what they did):
> Namespacing
> In the first month with crates.io 58, a number of people have asked us aboutthe possibility of introducing namespaced packages 90.
> While namespaced packages allow multiple authors to use a single, generic name, they add complexity to how packaged are referenced in Rust code and in human communication about packages. At first glance, they allow multiple authors to claim names like http, but that simply means that people will need to refer to those packages as wycats' http or reem's http, offering little benefit over package names like wycats-http or reem-http.
> When we looked at package ecosystems without namespacing, we found that people tended to go with more creative names (like nokogiri instead of “tenderlove’s libxml2”). These creative names tend to be short and memorable, in part because of the lack of any hierarchy. They make it easier to communicate concisely and unambiguously about packages. They create exciting brands. And we’ve seen the success of several 10,000+ package ecosystems like NPM and RubyGems whose communities are prospering within a single namespace.
> In short, we don’t think the Cargo ecosystem would be better off if Piston chose a name like bvssvni/game-engine (allowing other users to choose wycats/game-engine) instead of simply piston.
> Because namespaces are strictly more complicated in a number of ways,and because they can be added compatibly in the future should they become necessary, we’re going to stick with a single shared namespace.
> this change of ownership is an implementation detail that doesn't need to impact package consumers
it does, because having a dependency means having a trusted path from the developer to the packager to the repository into your software and to your customers
frankly you'd want to do what you call "constant pointless churn" at every single version update, because god knows what comes trough your package manager, as this and many other articles pointed out repeatedly, and the fact that you are advocating to skip it not for version changes but for whole ownership changes is the opposite of a security oriented mindset.
Every change period is. There's no guarantee that the same owner (individual or group) won't spontaneously go rogue and introduce harmful changes.
This is why modern package managers have lockfiles and give you control over when you upgrade any package version, regardless of ownership change. Ultimately, you are responsible for the code you reuse.
> The problem is that ownership, and even names of owners change all the time.
If ownership changes, I want to know. It's perfectly fine to cause a little bit of breakage there that is easily fixed in a semi-automated, supervised way.
If a name changes, there could just be an alias, unless there's literally a trademark dispute underway, in which case the ownership-change process can be applied.
> When you have very large transitive dependency graphs, the result is constant, pointless churn.
A very large transitive dependency graph is a terrible thing to have. It shouldn't be made convenient to have it. Your package manager should scold you for it!
In all seriousness, if you have such a huge graph, then the amount of churn caused by package updates will likely be the dominating factor, not the change of ownership.
> is easily fixed in a semi-automated, supervised way.
It's not. Once you have shared, transitive dependencies, the application author consuming a package is no longer the one who authored or is in control of the dependency on that transferred package.
> A very large transitive dependency graph is a terrible thing to have.
This is a fair subjective preference. Unfortunately, it flies in the face of reality. Anyone who maintains a package manager will tell you real-world package graphs are typically quite large and deep. If users didn't want that, they wouldn't do it.
> Your package manager should scold you for it!
Users don't generally like or use tools that scold them for doing what they want to do.
Why not resolve the owner name to some kind of uuid at time of package require/install and saving it in the lockfile? That way, a package upgrade can detect if the owner name has changed and fetch the correct package.
When package owners change, or change names, would the original package namespace still stick around for all the existing versions? Also, would there be a pointer built in that let you know about the new owner namespace when you try to upgrade automatically?
If so, I really don't see the practical problem. Actually it seems like useful information. If I'm upgrading and the package owner changed, that's definitely something I want to at least be told so I can look into whether that's important depending on the usecase. In that sense, it is a potentially breaking change. Particularly from a security standpoint.
If it's not the case, and everything including the version history is ported over to the new namespace and you're simply forced to change stuff just to get it working again, I agree this is just pointless churn.
"You're on the latest version 1.2.3 of @package-a/some-package, however @person-a has officially transferred ownership of some-package to @person-b and there is a newer version 1.2.4 available at @person-b/some-package. If you'd like to upgrade, please update your dependency to @person-b/some-package"
That's the mechanism, I was asking about the benefit of namespace in that case. With this kind of ownership transfers you lose all the alleged security benefit of namespaces.
I think we're asking the same thing, or I've not explained well. I'm saying when ownership transfers, it'd be good to keep all the existing version history on the original namespace, exactly so you're not forced to update all your dependencies in case you don't need to upgrade and/or don't want the updates of the new owner.
> Whenever a large number of skilled people do something for which an alternative is "infinitely, obviously better", there's a good chance that there is more going on than you know.
The Maven ecosystem has been doing things better than NPM for at least 10 years. The onus is on the NPM team for justifying their pathologically bad solution, it's also on the JS community to back off their "everybody can contribute" pipe dream. Maven works because in practice, we have some few organizations contributing packages, not thousands of individual, anonymous developers.
I think the problem is that Java people are considered "old and lame", whereas the Javascript people are considered "young and immature", respectively. While this assessment is somewhat accurate, it inhibits the process of learning from each other.
On further reading, it's debatable whether the "maliciousness" is not the community attempting to take away a project from its founder. Similar to how that toxic Twisted dev tried to push responsibility for maintaining their broken branch of Python 2 to the Python core lib dev.
You can fork, but then you're responsible for more than the "fun parts". The project founder might introduce measures to prevent fragmentation due to your bad decisions as part of damage control.
Java has used namespaced packages since 1996, and it's been a roaring success. There have certainly been problems - for example, there is currently some absolute nonsense going on about the handover of big chunk of stuff from Oracle to the Eclipse Foundation [1] - but they have always been manageable, and don't come close to outweighing the benefits.
Java was indeed a success, but its dependency management is archaic at best. Yes it's better than FORTRAN or C, but I wouldn't use it as a reference in 2019…
it's signed, hierarchical, decentralized, supports addressing multiple artifacts within the same module by type, includes a standardized way to retrieve sources or docs, supports pinning the version of transitive dependencies so you can manually resolve conflicts and on top of that can be extended to use completely alien package sources (which is unholy if you ask me, but it helped me survive the great osgi catastrophe of the 2010s thanks to tycho)
How would that have helped things in this case? If we go with the hypothesis of an angry maintainer getting revenge, would they not be just as angry at their project being forked by the community, and just as able to sabotage it via other libraries still in their namespace? (And perhaps more willing, since their original name is still around.) If we go with the hypothesis of a compromised account, people will still be installing the package from the original maintainer's namespace, so they'll still get the malicious code.
And in all likelihood, because the story here is that the maintainer intentionally (though begrudgingly) transferred ownership, they would have intentionally (though begrudgingly) given other people access to the package in their namespace, simply because people value the namespaced name. (If they didn't, and everyone was immediately happy to install anyone/purescript, then namespacing doesn't solve any problems and also creates some!) And the situation would have played out as given.
Why would anyone who owns a namespace be willing to leave a package that's moving to another maintainer within their own namespace? A namespace tends to come with a reputation, and if you give an outsider access to the namespace, the reputation can change without the owner's consent. No, I don't think I'd be allowing @delinka/ExcellentPackage to be maintained by someone else. They can fork it to @fredralphbob/ExcellentPackage and I'll turn off @delinka/ExcellentPackage when I'm done maintaining it. Yep, it'll break dependent installs, but that's the point: get dependents to move to the proper version. I think of a namespace like a domain. If I host a project at project.delinka.engineer, I'm definitely not transferring access to the subdomain to a new maintainer.
Yep, we still have people who would eschew best practice and go against my method above, but that happens everywhere. Just because a solution isn't perfect doesn't mean it's not an improvement.
> Why would anyone who owns a namespace be willing to leave a package that's moving to another maintainer within their own namespace? A namespace tends to come with a reputation, and if you give an outsider access to the namespace, the reputation can change without the owner's consent. No, I don't think....
Good first-principles argument, but in practice, this happens in ecosystems that do have namespaced packages. Off the top of my head:
- Until recently, kennethreitz was a GitHub organization so other people could manage kennethreitz/requests etc.
- Foursquare's Android app is still com.joelapenna.foursquared, which originally was a third-party app that got adopted by Foursquare and turned into their official app. Joe never worked for Foursquare.
> Yep, it'll break dependent installs, but that's the point: get dependents to move to the proper version.
Probably more realistically, you'd put a note in your @delinka/ExcellentPackage repo readme saying that the package is deprecated and to use @fredralphbob/ExcellentPackage instead going forward, and simply not publish any more new versions. In fact I'm pretty sure npm has a built-in system to warn users of a package that its canonical name has changed.
Imo your solution is just a band aid. The real solution is having a distribution of packages which are maintained and suppervised by a group of people. Like the linux distribution maintainers or the group that develops the language.
Community contribuited packages should be declared "install at own risk" like in archlinux aur.
All this is already solved. But people want to reinvent the wheel and ride the user generated content train.
> That's just shifting the trust to a different (smaller) group of people.
Practically speaking, shifting trust from a large, anonymous group of people, to a small group of people who are known and trusted by the community is a pretty good solution.
You may be correct for javascript, whose ecosystem is notoriously unstable and the equivalent of "building on sand".
eg write some code in a JS framework, then go off and do something for 18 months. Come back and there's a very good chance the entire framework has been obsoleted and replaced, perhaps even several times.
This is a problem for _any_ active development package, to the point where most language communities have (previously had) instructions on how to fix breakage of the language introduced by, for example, arbitrary Debian packaging restrictions without working with the language maintainers.
I seen this in _any_ environment that has a relatively active package management system of its own (e.g., Ruby, Perl, and Python included)—and it _still_ happens in some Java cases. The only _sane_ thing to do is to completely ignore the OS package management system and to package your applications as relocatable install packages complete with all dependencies you need, modulo the minimal OS stuff required.
It does work, in practice, and has for years. Every package in the Debian (or Ubuntu, or FreeBSD, etc.) package repos depends only on other packages in those repos and on the base OS. It works fine.
Packages being months behind the latest version is a feature, not a bug — it means things will only be randomly changing under your feet rarely, with the exception of security fixes.
> Packages being months behind the latest version is a feature, not a bug
I struggle to convince devs and management about this in the past few years. Everyone's gone crazy about latest and greatest set of features with no respect for stability, maintainability, etc.
To be fair upgrading very old libraries because you hit a fixed bug to very new libraries can be a long process... but I never have worked on a project that stayed on new libraries continuously so... not sure how it works the other way.
> Packages being months behind the latest version is a feature, not a bug — it means things will only be randomly changing under your feet rarely, with the exception of security fixes.
If things are "randomly changing" by updates, that means the upstream package isn't following semantic versioning practices, and more importantly, isn't preserving backwards compatibility with their releases. That's a mark of bad software development practices, and it also isn't solved by an arbitrary wait period: that just means months after the breaking change is done, you finally notice and complain, but most maintainers are going to (rightfully) ignore you by that point.
I'd make the same argument in general software quality: with new features, often come new bugs, so by delaying, you can delay introduction of those bugs. However, bugs are easier to fix the sooner they're found, so again, quick turn-around improves things, even if it sometimes causes short-term pain.
This is one thing where NPM is definitely miles ahead of APT. With APT, you get a single version of each package, so it's all or nothing. With NPM, you can specify `1.1.x` so even if a version 1.2 or 2.0 comes out, you're on the stable old one. The closest thing that seems to happen with linux packaging is on a major version (with backwards-incompatible changes) a new package name is created with a "2" on the end, to signal this incompatibility -- how is that anything but a hacky workaround to not having proper versioning support?
>With APT, you get a single version of each package, so it's all or nothing.
It's definitely not used as often as in npm packages, but you can use =version after your package name to apt install a particular version, or apt pinning for more complex setups.
The point your parent is making is that with mom, you can install one version of a package per project you’re working on, instead of one version on the system. Especially if you’re working on multiple projects or multiple versions of the same project, this is required.
But it doesn't work for libraries people will be using in development. Often you are waiting for a handful of packages to introduce specific features or bug fixes and need them the moment they are available. NPM isn't user space, it's dev space. Timeliness is the maxim.
I’m not a JS developer so maybe I’m missing something, but how do you use a library during development and not use it in production?
In the C++ world, I can’t imagine a situation where you would need to depend on, for example, libjpg while developing, but not need to read JPEG files in prod/end-user-space.
There are a slew of dev-only dependencies in JS-land. Packages that run local servers for hot-reloading during development. Test runners, linters, TypeScript compilers, SCSS compilers, and so on. None of those things need to be included with the bundled product.
Check out Electron or React starter apps for an example, their boilerplates should have dozens of examples.
Since another user explained that point, I would also add that in the "move fast and break things" environment of the web, for better or for worse, responsive and automated library updates are often desirable or even required functionality which can be tied to substantial real-world profits if something suddenly goes wrong somewhere in the stack.
Web development is one of my tasks and thakfully we don't move fast and break things, rather rely on proven and established development stacks like JEE, Spring, ASP.NET and VanilaJS.
There are some reasonably stable packages in the npm ecosystem, it's just pretty much the Wild West out there. It really depends on what you're building. It's pretty hard to get a new SPA started these days without a development environment which uses npm, especially if you're running lean.
The problem is keeping your dependency tree reasonable. Even pulling in a couple of packages might lead to hundreds of dependencies. And even if you have the sense not to use an external package for something as simple as left-padding a string, someone somewhere in that dependency tree might not feel the same way. And that's all it takes to bring a chunk of the web down.
All of these problems of course are due to JS not having a mature ecosystem or standard library. People shouldn't need to reinvent the wheel every hour, nor should they be pulling in modules less than a dozen or two lines of code. That, and the constant misguided financial incentive to deploy ever more complex functionality over http are what lead to this aggressive push for cutting edge tools.
Why do you think this? OSes like Debian don’t just pull packages from upstream automatically. Packages have actual maintainers affiliated with the OS, not the upstream community, and it’s those maintainers who build packages for the OS repos.
My software is packaged by Debian, and the update process is me notifying the Debian Developer (DD) responsible for the package of the new version => said DD pulling the new tarball from GitHub. Pretty sure no one’s gonna notice until after it’s pushed to Debian FTP if I introduce some subtle malicious code. Point is DDs don’t review version deltas for the most part, so when the upstream is compromised, they add little to your defense (other than security by outdatedness, I suppose).
Sure, but a very long time will pass between it being uploaded and being merged into stable, so there is a lot of time for people to discover your malicious code.
It is not like npm or crates.io where you can just upload whatever random code you like and people will start picking it up immediately.
This doesn’t contradict my point. I never said that Debian maintainers are more trustworthy than upstream 100% of the time.
I merely said that Debian packages are built, uploaded, and vended by Debian package maintainers, not by upstream. Whether that makes them more trustworthy or less is a different question.
Can't speak for parent, but because they have historically shown themselves to be trustworthy.
I don't know much about the "NPM community," but judging from the bits I do hear, I'm not sure they have done the same. OTOH, I understand that as an outsider that only pay attention when shit blows up, I'm only seeing the bad parts.
Those packages provide pretty much every software capability computing has to offer. It's a pretty vast set of capabilities, though npm/javascript itself has gotten way out of hand over the last few years too. ;)
You're being downvoted because that is mean-spirited, but the reason we needed these packages is to find out what the right size of package is. We now have a lower bound.
I'm not sure what's mean-spirited about the simple statement that the size/complexity of the npm-eco-system arises only from personal/corporate business-decisions.
In my opinion there should be different namespaces, similar to what the parent mentioned. There can still be the public namespace as there is now, but there can also be ones registered, such as google/mysql-golang. The other alternative is that it's just linked to Github/Gitlab, but then you run into potential naming collision issues.
Java got that part right in the 1990s, when it was the cool Web-savvy language. At the time, I especially liked how they piggybacked onto the existing DNS domain name control, avoiding having to create a new centralized registry to keep names unique. (Of course, more could be done beyond that, today.)
But it's easy to connect to domain name control. Allow uploading to maven central only after domain verification. Java did not do that, AFAIK, but other languages can do that.
I should've been more clear about what I was saying with "control". The recommendation is connected directly to domain names you control. What was not done at the time was enforcing that, or using that as a basis of authentication or distribution, which is part of why I said more could be done, today.
The purpose of the scheme was to make namespace collisions less likely and that's about it, though. And people regularly deviate from it, both then and now. Not using it as a basis for authentication or distribution probably remains a really excellent idea.
This is not a package namespacing issue. This is not a technical issue. It is mechanically no more difficult to change from "@shinn/purescript" to "@whatever/purescript" than it is to change from "purescript" to "purescript-whatever", except that in the common cases (where no disputed community moves have occurred without a corresponding name change) everyone has to include an author. The author can still "give away" or sell package names; a lot the value is not in holding the name, but having the existing installed base and mindshare.
These are the social issues associated with a hostile fork.
I don't actually feel like this solves anything, but instead just adds one more avenue for name hijacking. Now instead of trying to register an angular package first, I'll try to register a google namespace, or similar. If google is taken, I'll make an official_google, or g00gle etc.
Mostly because the vast majority of JS developers don't seem to be aware of the rest of the software universe, and so seem to reinvent the wheel, rediscover the worst of software's history, and discard the most useful of software findings with shocking regularity.
NPM tends to reinforce the worst of the JS world's tendencies.
Primarily, I end up basing this off of the types of libraries being developed for Javascript, and what kinds of articles and thought leaders JS developers tout as innovative.
So by looking at a tiny fraction of the 1 million+ npm libraries, and articles by a few dozen people on the internet, you are able to conclude that the
> vast majority of JS developers don't seem to be aware of the rest of the software universe
Forgive me if I dismiss this as business-as-usual JS bashing
I beg to differ; witness the fibur[0] Ruby Gem. Written by Arron Patterson (@tenderlove) to show that using threads in Ruby is very easy. The Gem consists of this single line:
Fibur = Thread
Of course, he did it as a joke, and it doesn't excuse the serious packages that NPM contains, but it does prove that other languages have single line packages.
I cannot talk for all other languages but at least in Java
1)All packages that are published can never be unpublished or re-released from a different contributor
2)Packages are namespaced
3)Nobody downloads packages directly from the internet. You always use a proxy which in most companies has security scans.
4)There are no "local packages" (like the node_modules dir), so it is impossible for the checked out source code to override your own vetted and secure package.
Not directly related to the incident of the original post, but I was mindblown when I realized that you can unpublish npm packages
1) Same applies for npm (granted, this was only fixed after the left-pad incident, and npm was not the only language's registry to have that issue).
2) As mentioned elsewhere in this thread, npm supports namespaced packages, but they are not mandatory. There are other major languages' registries in same situation.
3) Can you back up 'nobody'. I would suspect a lot of companies don't use a proxy. Some JS teams also use an internal proxy for npm, but it is obviously additional infrastructure to setup/maintain which has a cost.
4) Never heard anyone raise this as a problem before.
> Not directly related to the incident of the original post, but I was mindblown when I realized that you can unpublish npm packages
You can't, with the exception of a 72 hour window, to allow for accidental publishing [1].
1) The fact that an incident actually forced something that Maven registry did since inception, doesn't actually reinfornce the original argument? (that JS developers did not look at what other languages were doing already)
2) Again, whoever thought that namespaces should be optional instead of required "doesn't seem to be aware of the rest of the software universe". Who took this decision? Why?
3) Do a survey on your own. Ask Java developers you know if they use Artifactory/Nexus in their job and note down the percentage. Then ask the same question to JS teams
4) Just because something hasn't been exploited yet, doesn't mean it shouldn't be fixed. By that definition if left-pad hadn't happened would you say that (unpublishing) packages has not been raised as a problem yet?
1) As I said, it was not just JS in this situation at the time, it also applied to other major registries like PyPi. So your point does not reinforce the original attack on JS developers. Congrats to Maven for getting this right.
2) Namespaces were added later. It wasn't "a decision to make them optional". Also check out for the discussion here as to how namespaces don't solve this issue, this point is largely moot.
3) I'm not the person blanket attacking a community. Or making unlikely assertions that "nobody" in the Java world installs direct from the internet.
4) You work for a Linux distribution. You have several global npm modules already installed that are safe and secure. You download source code of a killer app in order to package it. You check the source code itself and it is safe. However you didn't realize that there was a local node_modules directory in the git repo that contains package foo-1.2.3 with replaced code that does bad things. That package overrides your global one. You ship a compromised app.
The above scenario is impossible with maven, because there is no concept of local modules. Only the "global" ones will be used when you package an app. So if you check just the source code and it is safe then everything is fine.
Sounds like your argument boils down to "check the source carefully", not "local modules are evil".
If the source was checked carefully, you'd notice a checked-in node_modules dir.
If you didn't check the source properly, you could install a module that seems like it'ss using a known package, but really is using its own malicious version of the global package.
> Perl (cpan), Python (pip or conda), Ruby (gem), and Rust (cargo) all behave as NPM does
Wrong.
None of them have chosen the "micro-package" way of NPM.
None of them have an average size of "3-10" line of code per package, which NPM has for many MANY packages. Cargo, pip have many packages but "reasonnable size" packages. ( > 100 lines ).
The Micro-package philosophy that NPM chose, meaning every single line of code can have its own package, increase by several factor the number of dependencies required to create anything in JS.
A simple hello world in react already have ~100 dependencies
Nobody is closing their eyes and singing "everything is fine." A large number of small packages is a good thing, and a technically strong ecosystem supports it. Having spent the last week in the internals of glibc chasing bugs in code that has no reason to be jammed into the same library that handles initial program loading, I can attest that there are good, justifiable, technical reasons to do things the NPM way, and I'm glad that there are smart, qualified, talented people implementing that.
I know it's hard for you to imagine, but perhaps the JavaScript ecosystem has some good things about it.
> I know it's hard for you to imagine, but perhaps the JavaScript ecosystem has some good things about it.
There is good in very ecosystem. But a simple keyword "node_modules" on twitter should probably convince you that the good of JS is not in its package system.
To be fair, even the NodeJS author agree on that.
Modularity does not mean "split your code at the atomic level".
Including a package named "one-time", bundled several times in two different versions. To do something highly relevant and technical like "Call a function once".
I have no doubt that it is an Highly complex code that requires indeed two packages..... Irony
Little question: What would have been the probability of purescript getting malicious if its dependency tree would be something reasonnable... Let's say 20 packages instead of the current ~200 ?
The crab-grass like dependencies of many/most NPM packages is scary enough, and then they (or you?), I guess because of lazy loading, to improve responsiveness, update it as you watch.. It's like a scene out of an alien monster movie, where the creature keeps growing more limbs.
conda use namespaces (called channels). If you want a package from a random person, you've to explicitly add it (either globally or when installing the package).
That is why I cited the behaviour and culture of JavaScript-only devs as the primary explanation, with NPM's model reinforcing those issues - issues that do not exist in the same manner in those other languages' ecosystems and cultures.
cargo, at least, does this for the reason Fellshard gives - the Rust team essentially commissioned a copy of Ruby's bundler, without considering whether there was anything to learn from any other language's ecosystem.
Well, okay, if your point of view is that behaving like multiple other major languages and specifically taking the lessons (good and bad) of a specific existing language ecosystem into account before doing your own thing is the same as disregarding other languages and reinventing the wheel, I'm not sure what words mean anymore.
> Well, okay, if your point of view is that behaving like multiple other major languages and specifically taking the lessons (good and bad) of a specific existing language ecosystem into account before doing your own thing is the same as disregarding other languages
If you've only looked at two or three languages that not coincidentally happen to do things pretty much the same as each other, and not looked elsewhere, then you are indeed disregarding other languages.
> and reinventing the wheel
I didn't mention reinventing the wheel.
> I'm not sure what words mean anymore.
Agreed, but i'm not sure i can help you with that.
Cargo is more like npm than Bundler in this regard, as Bundler does not let you have multiple versions of a package at the same time. That lesson was learned from npm, though implemented in a different way.
There's a few different elements of the design space. But, while Cargo can use crates.io, it's distinct from it, so on first principle, this would be a crates.io feature.
That being said, Cargo would also have to understand it, because the Rust language does not understand namespaced external packages, so you'd either have to change the language, or change Cargo to do something to paper over that somehow.
> When we looked at package ecosystems without namespacing, we found that people tended to go with more creative names (like nokogiri instead of “tenderlove’s libxml2”). These creative names tend to be short and memorable, in part because of the lack of any hierarchy. They make it easier to communicate concisely and unambiguously about packages. They create exciting brands.
I will never stop admiring your ability to see the bright side of things.
PHP had the luxury of coming out with a package manager (Composer) later, and learning from others before it. (2012 vs 2010 for npm).
Npm similarly improved on a lot of package managers that came before (e.g it's superior to Pip, which doesn't resolve dependencies [1]).
> It makes me wonder why npm hasn't already moved to namespaced package names.
Also, npm does have namespaced package names, the problem is that it was introduced later, so isn't mandatory. I'm not sure how they could make it mandatory without breaking everyone?
Apache Maven was released in 2004... PHP just looked at what Java was doing. Might be a wrong idea for some things but it definitely helps for stuff such as package management.
There's nothing preventing whatever_purescript.
Even if you have namespacing, anyone who owns a spot in it can sell or rent their spot to a malicious actor.
Java got this right with reverse domain names over a decade ago. To use a namespace you have to own the URL. Simple and effective abuse resistant package naming.
I honestly don't understand why people use packages like this. If I need this functionality, I will simply write my own. Plus, I will never able to find this specific package. I guess PureScript uses this because its author is also the author of rate-map.
I’ll give you one: it’s code already written and tested by > 1 person, edge cases already figured out. Saves you time. The gains are small but quickly add up.
This is why lately I’ve been a fan of very extensive standard libraries (like Crystal has) - its like having a huge repository but vetoed by the same team and without any of the package management drawbacks.
> it’s code already written and tested by > 1 person, edge cases already figured out.
It is tested? Edge cases figured out? Essentially for this code?:
return start + val * (end - start);
Sure, if its running in hostile environment, you might need to do those sanity checks for your parameters - but I have hard time imagining such situation. If you actually need to "Map a number in the range of 0-1 to a new value with a given range" in your own code, can't you guarantee that the variables are all numbers? Its your responsibility as a developer to know your code, and the data your code is handling.
> Saves you time.
There is something really wrong if finding a package to do this niche thing is faster and more optimal that just writing out that one line of code.
Also: the way the library handles those edge cases isn't necessarily the way you want. Case in point: rate-map throws exceptions in situations where you might expect it to fail more gracefully.
What gains? That line wouldn't take much resources or time to figure out and test. Now you have yet another dependency that could be injected with bad code in the future...
[C]ode already written and tested by > 1 person, edge cases already figured out wastes time when that person was not you. This costs you time. The losses are small but they add up.
> This is why lately I’ve been a fan of very extensive standard libraries (like Crystal has) - its like having a huge repository but vetoed by the same team and without any of the package management drawbacks.
Having had some involvement in the early days of node, I had imagined there being something like the Python stdlib. When the npm world grew, I thought "oh, that's pretty cool. It's neat how npm can handle multiple versions of the same package."
Now, I'm in absolute agreement with you. There are definitely downsides to a large standard library, but I think the upsides are worth it if that library is maintained.
There is not even a link to the source code on the npm page for it. I installed it and inspected the source code, but I doubt everyone does this when installing a dependency.
Funnily enough, in my opinion, the JS ecosystem fully embraced the Unix way: have small programs/libs that do only one thing and do it (somewhat) well.
I fear your analogy does not go far enough if you are comparing the JS ecosystem to the UNIX philosophy.
Javascript would offer a library for every single option, variant and logical operator for a UNIX command. Combinatorial explosion will devour the web. It already destroys dev laptops anyways when downloading something via NPM.
The problem is clearly due to vanity metrics like number of packages motivating people to publish an insane number of useless packages to fluff their contributions.
Agreed - I’ve found and been astonished by Github users who maintain hundreds of these little npm packages, all of which have usually under 10 lines of actual code, and sometimes even have chains of dependencies on the user’s other packages.
It seems like the only reason these packages get any significant downloads is when one of them gets depended on by a big package, causing the entire dependency chain of the user’s little packages to be downloaded.
and if you look closely, 70 of those dependents are just small-scale/bs packages and the only sensible real dependency is cross-spawn _via_ the shebang package (which applies this regex to a string). The whole ecosysten could use a purge of all those useless 1-line-requires (which were all introduced by helpful commits from the "i has 1337 downloads"-community), currently this is madness.
Don't know whether to laugh or cry at this. Truth is stranger than fiction in the land of JavaScript. Could any developer 20 years ago predict this is what software engineering on the Web would devolve to ?
Do not hire people who do this. Most of them will say "over 200m downloads of npm packages". Go and take a look. If you see things like 'is-not-foo, checks if a given string is not equal to the string "foo"', and 0 meaningful contributions, either pay no attention to these claims or pass.
It's an ecosystem full of reinventing the wheel. The most popular library, lodash, includes a reimplementation of a foreach loop for Pete's sake, for reasons passing understanding since it's part of the ecma spec.
JavaScript is just amateur hour, and these things are going to keep happening. It's pathological.
Which foreach are you talking about? Array.prototype.forEach, for...in loops, for...of loops?
The first only works with arrays and array-like objects.
The second works on objects and arrays, but it iterates over all enumerable properties, so you don't want really want to use it for arrays. It's also made a lot less useful because it only iterates over properties, not keys.
The third finally provides some sanity, but it's only been around since ES6. Before that, lodash's each method was the most reliable way to iterate over a collection, be it an object or an array.
Just because you don't know the reason for something doesn't mean there isn't one.
`for ... of` only iterate over objects that implement `Symbol.iterator`. Native objects don’t do that by default, so `_.forEach` is more useful the `for ... of` even if you are only targeting modern browsers and not compiling the code down to an earlier version of the spec.
That said, you can use `Object.keys`, `Object.values`, or `Object.entries` if you want to iterate over objects that don’t implement `Symbol.iterator`, so if you only need `_.forEach` there is no reason to pull in any libraries.
Yes, this one. If the object is an Array. According to whatever test they are using for that.
Lodash includes a reimplementation of Array.prototype.forEach because mistakes were made. It also works on other objects because other mistakes were made.
We all make mistakes. But just because there is a reason for something does not mean there is a good reason.
Before "tree shaking" I stored all npm modules in SCM and reviwed all updates as I had to commit after "npm update". I also put ton of files in .ignore as 90% of files in some packages are not required. I also used to include npm modules in distribution/deployment. So my request to npm is to add an option in the main package.json to disable tree shaking.
I wish there was a way to "bless" packages when they were reviewed.
I want a network of trust, such that a Google reviewed package is worth 10 points, a package fuzzed by foobar is worth 2 points, something skimmed by a dependant user is worth 1 point etc.
I can then chose a compromise between a highly rated/reviewed dependencies and functionality/risk/cost-to-review.
My own blessing of a package I have reviewed might become a very small signal in a web of trust.
I assume the idea is that, even if Google violates their user’s privacy, they take care not to let others violate their user’s privacy through their dependencies.
Take the most popular npm module lodash for example. It has over one thousand files! But you probably only need one (lodash.js) and that's the one I would commit to SCM.
I think they mean code - just remove all files that your program doesn't touch. With something like JS, this should be doable in cases because of the way libraries tend to be designed.
The equivalent in a compiled language would be to strip unused symbols from the binary.
I wouldn't use the words "malicious" or "exploit" wrt this... It's more like, I dunno, trolling on planet JavaScript? I feel like there should be a big Twitter fight about it...
That was my first thought. But then I realized some guy basically broke something so his stuff would work and someone else's wouldn't. He didn't destroy files, but that was malicious as hell.
mean spirited and dramatic as hell, yes... also, a bad place where real "malicious" things could be done. but "malicious" has a specific meaning and this didn't affect users.
more like dramaticious if you ask me... but also uncovers actual dangerous weaknesses in the npm delivery pipeline...
it's kinda like a cat-fight in the one hundred acre javascript wood... pretty harmless, nobody's shit got pwned, but holy shit, kind of a vulnerable vector they found...
shinnn is claiming his account was hacked, and the hacker added the code. Generally if a hacker hacks someone's npm account and adds harmful code, that would be considered malicious and an exploit.
Yes, but it sounds like Harry Garrood doesn't have enough evidence to outright accuse shinnn of lying, and the NPM team also is acting as if shinnn isn't lying.
So instead of going on a lone and not-fully-backed-up crusade accusing shinnn of lying that could wind up in a he-said-she-said situation with negative fallout, Harry instead decided to use shinnn's words for his own benefit, and hype it up as a serious NPM account hacker inserting malicious code.
The work “exploit” is used several times but none of the code seems to exploit anything. Also, “malicious code” usually has a different meaning than something that intentionally makes the program crash during the installation process.
I feel like this is only part of a wider attack - like by causing this not to download, it meant that users do some other action which opens them up to the real attack.
I think the blog author is implying as much as he can, without directly accusing, that he believes that https://github.com/shinnn was responsible for the bad code, not a random hack.
To quote the article: "As far as we are aware, the only purpose of the malicious code was to sabotage the purescript npm installer to prevent it from running successfully... the purpose of this condition [in the code, hardcoded to include the word 'cli'] seems to be to ensure that the malicious code only runs when our installer is being used (and not @shinnn’s)."
>>[[
9 July, around 0100 UTC: @doolse identifies thatload-from-cwd-or-npm@3.0.2 is the cause.
See purescript/npm-installer#12 (comment)
@doolse opens an issue on the load-from-cwd-or-npm repo pointing out that the package is breaking the purescript npm installer (although at this stage, none of us spot that the code is malicious). This issue is later deleted by @shinnn.
]]
Hmm indeed. A hack is possible but the timeline of events is dubious.
Unfortunately, based on preexisting cases it really doesn't make much of a difference, the liars will still deflect the blame and the logs will always be "wrong".
I mean, do you really think some hacker compromised @shinnn's account, solely for the purpose of sabotaging a new installer that had only been published for 8 hours?
I mean, I'm all for benefit of the doubt and such, but it's pretty obvious what happened here.
This was my gut reaction, but on further thought, this whole thing seems so needlessly petty that it could have easily been the author attempting to make the other person look bad.
I don't think encryption keys actually are useful for the average case here.
Currently, developers store npm tokens which may be stolen because they're often stored on disk or as environment variables.
Requiring developers to store encryption keys makes almost no difference: the private key will still be stored on disk and will still be vulnerable to effectively the same attacks.
There are some differences of course. Security-conscious users could use hardware tokens to store their encryption keys, and they could password protect the private key in either case. This is not the large majority of users though, so in the average case, it won't matter.
What JS needs a standard library(s). Developed and manintained by MS/Google or preferably a foundation. No reason every library have a 10 level deep dependencies.
High-velocity ecosystems like this make it way too easy to optimise your code for getting pats on the head. It's good that people are pointing out things like this (relatively speaking) not long after the fact, and that a lot of people on here are annoyed about it. The fix is cultural as much as any 2FA implementation.
Part of the problem is the bounty for attacking NPM packages is high. You get a high profile exploit and lots of people talking about it, or you can even get some of your evil JS code running on thousands of sites on the back end or the front end.
Compounded by the fact there is no decent base class library for JS like you'd get for .NET [0]. Want to do anything you could do by default with .NET BCL? Like open a url, save a file (with nice api) or parse some XML?
Then npm i ... it is. And hope it doesn't pull in an exploit.
As a mitigation I recommend people consider writing their own code (NIH) for simple stuff not npm i all the things.
[0] I'm comparing to .NET but same could be said of Java/Python/Ruby etc.
This is not the first time this year we see an npm issue, and it could have been much worse than this. All package managers in general create risks, but how the community etiquette evolves around package managers is just as important. Something is wrong with the latter here.
One of the things I like about Purescript is that you can use it without needing any javascript package manager, and without running any javascript outside of a browser. Nix works well for installing the Purescript compiler as well as psc-package. https://nixos.org/nixos/packages.html#purescript
NPM gets a lot of hate for it's dependency managemnet, but I'm not sure what a solution would be to this problem.
- They can't currate packages, or else that friction will drastically slow down the ecosystem (1000's of packages get published everyday).
- They can't remove/disable packages (most of the time), or dependencies will no longer be strictly immutable.
- They can't disable sub-dependencies, or else this would greatly reduce code reuse and increase redundancy and complexity of packages (every package may have to roll there own X, or compile their package dependencies into bundled JS with no dependencies).
I think the problem is simply; it's a low friction dependency management solution -> which made it so popular -> which is making it a target for malicious actors.
After 10 years of nodejs I can honestly say I wouldn’t mind the friction. NPM is a wasteland of abandoned packages and reinventing the wheel. They never solved discovery so you have 500x implementations of the exact same thing. Around 2015 we passed the point where looking for the “right” package takes longer than writing your own.
See, .NET get a lot of hate from the open source community, but this reason is one of the big ones of why I prefer it. Even when you compare it to Java, I feel like it's way better in this regard.
If I want to work with JSON, I use Newtonsoft.JSON. If I want an ORM I'll use Dapper (lightweight) or Entity Framework. So many libraries that would you need if you're using Java or JS is just built in to the standard library.
I know it's not a 100% fair comparison, since I am biased. Part of it also might be because .NET is younger and less widely used, but I feel it's pretty true. Though, Python has ton of open source support and is better in this regard.
I'm not sure that you understand what .NET is. .NET is a platform/ecosystem. I was talking about packages. Packages are managed by nuget, which works across all .NET languages, which is why I didn't specify a language.
Framework, platform, ecosystem, SDK, etc: you're rather neatly sidestepping the point that you're comparing not a programming language to programming languages. A fair comparison would be C# to Java, or Javascript.
What Microsoft chooses to call the .NET Standard Libraries is not at all the same thing as a language's standard library.
So my point is invalid because I wrote .NET instead of C#/VB/F#? You might want to contact nuget to tell them their site is wrong too then.
My point is that finding quality packages is easier with nuget than the JS or Java package managers. And using them is usually easier too. It's a fair comparison.
Uniqueness and trust of package naming is easily solved: use a URI. It doesn't even have to resolve. URI is a naming convention that invokes both uniqueness and universality.
So what you are saying, is all the code i see on blogs, demoing that cool little JS thingy, is actually just demo code, not prod ready. To get prod ready, u need to break the NPM dep, and vet everything on your own. So your telling me npm run serve isnt good enough for prod either?!
The facts the article refers to happened two weeks ago and have since been resolved, taking further steps to reduce the chance of this happening again in the future (e.g. by vendoring a lot of code)
You know a platform doesn't care about security if either:
a. They don't do end-to-end integrity and non-repudiation (not signed hashes of
files, not just https, not just hashes, but signed archives/files that can be verified as coming from the developer either with gpg, s/mime or x509 certs)
b. They allow packages to execute code or scripts on download or installation
And, they don't care about your time if they don't automatically offer a prebuilt, reproducible binary mechanism with a build-from-source install/verification option.
Not actually malicious. It doesn't steal user data, drop malware, or damage a computer. Just crashes the library. Looks like another developer-developer slap fight.
I beg your pardon, but if I am using this library as part of a shipping piece of software-as-a-service, and I am in the middle of shipping a new feature when suddenly things mysteriously crash...
If I later discover that the crash was put there deliberately, I am going to call that malice, and malice that has directly impacted a functioning business and its customers.
It's no different than a disgruntled person putting tacks on the road outside of a supplier. If my truck goes there, gets a flat, and crashes into the ditch as a result, I would call that malice as well.
Deliberately crashing software that other people depend upon is malice.
That being said, I always get pushback when I mention this but I think SaaS projects should often be vendoring dependencies. It's safer, it's more secure, it gives you more consistent installs -- and it prevents `leftpad` scenarios. It makes source control slightly more complicated, but the other benefits (often) greatly outweigh that.
This is something that used to be more commonplace in the Javascript community, and it's something that `node_modules` makes very easy, but it's fallen out of style in modern web development.
To the best of my knowledge, this was also commonplace in the original design of Go, since it was coming out of Google, which does vendor all of its dependencies. I'm not sure which way the current Go community leans.
Vendoring as practice isn’t common in some ecosystems (ruby, JavaScript, python, Clojure), but I do see more people using caching proxies and services like JFrog’s to ensure they always have access to particular versions of a dependency.
Sadly, this still doesn’t fix non-repudiation problems we see in ecosystems that don’t enforce things like package signing.
The refund for the amount you paid for the library is on its way.
Once again I'm reminded about that sentence someone once said. With random open-source libraries you're dealing with something someone else put out there just because they wanted to, having any kinds of expectations that someone will or won't do something is seriously short-sighted and even pretentious. Do you go around running random .exe-s you find from the internet? Why do you do so with the dependencies for your projects and expect a better end result? You may not like to hear this but it's true.
There's two solutions here, either you start reviewing the libraries you use, every release, or sign a support contract that obliges the maintainer to do something you want.
You are not a lawyer, because if you were, you would be aware that there is case law establishing that just because you don't charge for it, doesn't mean you aren't providing an implied warranty and aren't taking implied liability.
There is absolutely zero chance that you can put malware into an open source project, give it away, and then when sued, stand up and say, "It was free, what do people expect?"
You can call me pretentious until night turns back into day, and maybe I am, but the thing we're discussing is a matter of law, and it there are nuänces above and beyond what random people on the Internet would like to believe about how giving software away works.
> just because you don't charge for it, doesn't mean you aren't providing an implied warranty and aren't taking implied liability.
Open source software is almost always distributed with a license that explicitly disavows any such warranty or liability. This is pretty widely understood...
Just because you put it in a license, doesn't mean it will hold up in court.
Example: I distribute a flashlight app. It contains an obfuscated bitcoin miner and a MITM that collects your login credentials. My license does not mention ether of these things, but it does say there is no warranty or liability.
What do you think will happen if I am sued in court and/or charged with a crime?
You're talking about an app. I'm talking about open source software freely posted online. Let's apply some common sense here. What you said:
> if I am using this library as part of a shipping piece of software-as-a-service, and I am in the middle of shipping a new feature when suddenly things mysteriously crash...
> If I later discover that the crash was put there deliberately, I am going to call that malice, and malice that has directly impacted a functioning business and its customers.
Now what will happen if you take this library author to court? Let's ask some basic questions that the court might touch on:
* What was the harm caused by the software breakage? You were unable to ship new versions of your software to customers, resulting in reduced revenues
* What general arrangement or expectation did you have with the library author? None, the library author distributed the library as open source and explicitly disavowed (in writing) any obligations to the library's users
* What specific arrangement did you have with the library author? None, you don't know the author personally and you never transacted with them, offered them any compensation, or any other kind of business arrangement to provide you with the library
* What evidence do you have that the author acted maliciously? Almost none–they acted erratically but did try to offer a reasonable non-malicious explanation
I don't think any court in its right mind would find any substance in this case. If it did, every Tom, Dick, and Harry would start crawling out of the woodwork claiming some OSS had maliciously broken their code. It would quickly kill OSS. And not just that, the same principle would apply to any general publication, academic or industrial research, talks and lectures, etc. Society can't function that way.
What if the dev just publishes an update that removes the flashlight functionality? It isn't malware, it just doesn't work. I don't think you could sue the dev and win.
I don't know how you believe laws work or what you hope to discuss but the reality is that in the case of software, laws offer deterrence and recourse to any malicious actions. It's absolutely stupid to take a repository by an anonymous person, execute it and hope it's not malicious or doesn't have any bugs. Not to mention there's nothing obliging that a piece of software has to be bug free, maintained - go and now determine if a bug that deleted your production data is malicious and if you have any recourse. I'd love to see any actual cases about software distribution causing damage that don't have anything to do with malware distribution.
You are arguinng a strawman. We are not discussing bugs or maintenance, we are discussing a person acting maliciously. Furthermore, you are talking about people being "stupid," which has no place in a discussion of whether a person giving away code has an obligation to not act maliciously.
Never in the history of the courts has a defendant's lawyer gotten up on his hind legs and intoned, "But your honour, the plaintiff was stupid," and had the case summarily dismissed.
Naturally, one can make arguments about what precautions the user of some software ought to reasonably be expected to perform to avoid harm.
I agree it may be prudent to assume that every maintainer is malicious and sits up all night trying to think of ways to put malware in your compiler, but I do not agree that this is going to be an effective defence in a court of law if you actually put malware in a piece of software that you give away.
Now please excuse me, I am about to audit every last line of code in Unix. I have no more time for exchanging pleasantries with you.
You started the thread by saying that if you used a library that is broken by the maintainer you would call that malice. Things being broken is directly related to bugs and maintenance - detecting if and how a breakage is malice is the first problem in your arguments.
I'm also trying to tell you that your whole base premise is wrong, that even expecting some library to work or to keep working is too much (unless you apply one of the solutions I offered). Calling certain behaviors stupid absolutely has a place in a discussion about when people play with fire and then are surprised they get burnt, I think you deliberately missed my point that if you put yourself in danger you only have yourself to blame and most laws do care about that nuance. In the end the job and obligation of keeping the software you write secure is just as much on the person writing some libraries.
We can argue if x or y are effective defense in courts or not but as I said, that hasn't been tried out in the case of open-source software being broken. I also have to repeat that when you look at malicious software and changes in practice then the law applies retroactively and you have to deal with preemptive defense yourself - going back to my first point(s), you have to change the way you develop software instead of hoping what you randomly execute is good.
Hopefully you now understand what I'm trying to say to you better, English isn't my first language, sorry.
In this specific case, the code was written such that it deliberately broke installation for users. I consider that malicious. The “deliberately” is the important word here.
People make mistakes. Nobody wants this to happen, but my colleagues and I have sometimes pushed a bad deploy that broke our product, and we rushed to revert to a known good state.
That’s not malice, that’s (temporary) incompetence.
But if we deliberately broke something for our users, I would consider that malice.
it would also be some contributory negligence if someone is shipping a service that just grabs and includes unsigned/unvetted/untested code from the internet as a component...
> If I later discover that the crash was put there deliberately, I am going to call that malice, and malice that has directly impacted a functioning business and its customers.
Hey, it's open source party ! Where is your patch ? /s
edit: oh, bummer. Not really open source. Still: THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, etc.
How are we supposed to have code free of malice in the current javascript ecosystem ?
You touch on an interesting point there if the typical disclaimer is included: does that actually protect against willfully damaging code?
It's certainly (and reasonably so) very hard to hold a developer accountable for a bug in an open source library they published. But what if they put rm -rf / in the installer with the intent to delete the files of anybody running it? Does "you should've looked" really work there, or is there an assumption of good nature and malicious behavior voids the disclaimer?
Obviously everything depends on what court has jurisdiction, but aside from the fact that such disclaimers already have little weight, even signed waivers will not protect you from malice or neglect in many places.
It's childish and amounts to an indirect attempt at damaging the reputation of the compiler maintainers. It's playing fast and loose with everyone who needs an install up and working for reasons. I'd say there was a heap of malice and it reflects very badly on mister "someone gained access to my account".
It's truly mind-blowing how this @shinnn character chose to handle this. It's so petty that I would actually have more respect for them had they snuck in something more devastating, like scanning for bitcoin privkeys.
While people give reasonable counter-arguments here to your point, it gets me thinking - I wonder if we could find a truly benign form of "malicious code" that "good guys" could use to find attack vectors before the "bad guys" do.
Perhaps there could be a website set up to get pinged by "malicious" installers? Perhaps it displays stats of some sort? It could turn into a friendly game.
I actually built something like this a few months ago. I called it “OSSassin” (like the game[1]) so devs could sign up & get a unique ID/url to ping with the idea that everyone participating would then try to secretly, and not maliciously, “assassinate” the packages that have agreed to participate by sneaking in code to ping their endpoint. Just as a fun/friendly game to help identify potential vulnerabilities. It has a leaderboard for “kills” but it’s currently empty because I never released it. While I’ve got some experience using packages, I’m still pretty new at software development & have never really been involved maintaining any packages. So I guess what I’m saying is I’m not entirely confident the way I built it made sense until I just saw your comment. I also wanted to add GitHub OAuth as a way to verify/limit who was participating but I never got around to it.
I’m on my phone at the moment but if you (or anyone else) has the knowledge/interest in taking a look at what I’ve done I’d love to release it with a partner.
That sounds like a really fun gamification of an aspect of security research that maybe doesn't get much practical exploration.
I'm not sure, though, how the game would distinguish between vulnerable code pinging the endpoint, as opposed to a player simulating a successful attack by performing the ping themselves.
I feel like the attacker should be able to provide a link to a package on a software repo somewhere and the game should download the package and check for some unique "Player 123456 attacked this package" string.
This reminds me of a similar game someone created where they challenged people to add a certain string to any repo that the challenge creator maintained. An ingenious social engineer then pointed out that the rules of this game weren't in an easy to find place, so the challenge creator added a page to their blog, which was under version control... Game over. Unfortunately I can't remember any detail that would allow me to find a citation for this story.
> I'm not sure, though, how the game would distinguish between vulnerable code pinging the endpoint, as opposed to a player simulating a successful attack by performing the ping themselves.
Yup, you're definitely right. Every time it's pinged I have it download the github repo and then search for the string but your suggestion makes way more sense. If someone managed to get their ID into a popular package the repo would be downloaded & searched again and again and again. I was also playing around with it only using one repo so I hadn't really given any thought as to how it would know which repo to download if there were more than participating.
I'll dust it off, make some changes you've suggested and then put it on GitHub so @9dev (and anyone else) can take a look and tell me what else needs improvement.
If you're still up for this, publish the source on GitHub. That's an excellent idea and I'd like to participate :)
@dane-pgp's suggestion of verifying the hack by checking the source repository is great too. Maybe it'd be easiest to just name the endpoints something like `/ping/github/<handle>/<package>`.
Awesome! Thanks, a ton! I haven't touched it in a while but I'll take a look at it now. I'm going to take a stab at fixing it up a bit and then I'll put it on my (currently empty!) github. I'm a bit leery of including my username in an HN comment - I once made the mistake of mentioning my twitter handle here and got some weird selfies in my DMs - so I'll put my github username in my HN profile for a day so you'll know where to look.
Having to ask someone to gift a `purescript` package shouldn't even be a thing. It should've been `@shinn/purescript` and the compiler developers just create their own `@whatever/purescript`.
This is something Elm and many others got right. https://package.elm-lang.org/ It's just infinitely, obviously better.
You see all sorts of problems because of this, like people "giving packages away" when they quit. Or buying package names. Or coming up with annoying name hacks because the obvious, best name is simply taken. Or people thinking/guessing that `npm install mysql` is the correct/best/canonical package because it's the simplest name, and anyone who publishes a better library has to name it mysql2 or better-mysql, etc. These just shouldn't even be things.