Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Node.js node_modules file pruning (github.com/tj)
111 points by tjholowaychuk on Nov 19, 2017 | hide | past | favorite | 61 comments



Rather than two giant 440px high images consisting of tiny text in the readme I'd quite love to read exactly what it does and why. Does it prune files in modules that are not needed? If so what heuristics does it use to prune files? Or does it prune modules that are not needed?

Reading the code[1] it appears to delete directories and extensions based on a blacklist.

1. https://github.com/tj/node-prune/blob/master/prune.go#L15-L5...


It looks like it removes the files necessary for "npm install" assuming that the module will do npm install immediately when you install it. This is not true for all cases though..which is why the files should be kept around. Disk space is cheap. This isn't bad for deployments though.


It's especially good for deployments to AWS Lambda.


Love it. Saved 600MB... and test cases still passed :)

I ported it to bash really quick, as thats probably easier for people to drop into their dotfiles if they don't have a Golang env setup.

https://gist.github.com/gpittarelli/64d1e9b7c1a4af762ec467b1...

If we all just do our part and send a PR setting up .npmignore for one or two of these projects, maybe we won't need this anymore


Great. I wrapped your script into npm package https://www.npmjs.com/package/node-prune


And here is a project that helps you decide upfront if you should rely on that bloated module:

https://www.npmjs.com/package/download-size

The tool is available online too:

https://arve0.github.io/npm-download-size/


Great one!


https://github.com/tj/node-prune/blob/master/prune.go#L57

Doesn't this also result in .d.ts files being removed? These are type declaration files (Kinda like C's .h files) that provide types for you without the file size overhead of the full TypeScript source.


They aren't needed during runtime though, that's what this project is attempting to clean up.


ts-node no longer works.


What's the benefit of running ts-node in production?


I don't have to distribute .js files.


Try `ts-node --fast`


If the files are deleted, how will that help?


Oh, right. I was thinking it was only pulling in types.


The Yarn autoclean command also does something like this: https://yarnpkg.com/lang/en/docs/cli/autoclean/


https://github.com/tj/node-prune/blob/20703f18e9a7996683f1b8...:

  // Copied from yarn (mostly).
I infer that TJ used `yarn autoclean` as a starting point for this.


Awesome idea. Especially if you’re packaging your projects via something like Docker this can shave hundreds of megabytes. (Even if you’re using the Docker layer cache, any change to your dependencies will bust the entire node_modules cache.)

The caveats is that this won’t work 100% of the time. There will be a dependency deep down in the tree that needs to read a markdown file from the file system or something of that sort. These issues could be hard to debug if this tool doesn’t provide good visibility into what’s being deleted.


you're looking for `yarn autoclean`. they have `.yarncleanrc` IIRC


Is this serious?

I mean, kudos for helping push node module developers to make better use of `.npmignore` et al, but this project strikes me as overly snarky or even trolling...

Seems like what we really need is for some volunteers to tackle the most-installed packages on npm that publish with many extraneous files.

Hell, you could even automate it by writing a bot that (politely) points out likely-extraneous files and opens a pull request with changes to `.npmignore` to clean things up... Hmm... Weekend project brewing...


It's not a troll, this is for AWS Lambda, where size impacts deployments and cost starts. See my comment below about the automation, I think it could/should be done too!


Wouldnt it make more sense to bundle, minify and treeshake your package than to delete unused files in node_modules? Tree shaking will remove unsused code even from files you are using.


You can't always do that (native modules, other non-js assets..)


Shouldn't this be handled entirely by npm? If these files are unnecessary why do they get downloaded in the first place?


No it should NOT be handled by npm. Imagine some npm project really needs those files. Like module that is rendering markdown files to html. If npm would be that "smart" to strip your .md files, you would be pissed and blame npm. NPM is awesome as is. It's developer responsibility to write proper .npmignore, because only him/her knows what can be excluded.


Ideally people use the "files" array. It might be nice if someone writes a bot to go around and fix large packages by adding that.


Many authors are strangely against it for some reason. One time I sorted my node_modules by size and opened issues for the top offenders, you can see their resistance here:

https://github.com/crypto-browserify/sha.js/issues/5

https://github.com/medikoo/es5-ext/issues/11


Speaking as very-much-not-a-JS-developer: Isn't this essentially the same problem Linux package managers solve with -dev, -doc, and -dbg packages? I.e. the default install only contains the minimum necessary to use a program, and if you need the header files/documentation/debug symbols, you can just install them separately.

Is it too hard to meaningfully separate these parts of a package, or is it more of a philosophical issue?


It really is not too hard to separate these things in npm, but very few packages do.


What gave you a reason to believe npm was well-architected?


It's more an issue with the modules authors, npm already has a documented way to exclude files from being published.

Furthermore, it would be quite irritating if it started assuming specific file extensions are forbidden instead of using the explicit list provided by the author.


See my top-level comment. It should be handled by library authors, and npm does provide the tooling to do it, but most authors choose to include the files. Seems to be a difference of principle.


No disrespect to TJ (he did many awesome projects I use almost daily), but wouldn't simple: 'find node_modules -not -regex ".*.[js,json]" -delete'

do the same job? Why project in go?


Maybe it's meta satire :D


ok then my comment can be forgotten. :D


Oh wow TJ is working in nodeland again. I thought he had moved entirely towards Go.

Glad to see more of his work!

Edit: It's Go lol.


Even better: submit a PR auto fix these issues on the corresponding GitHub repo when pruning


TJ, https://apex.sh/ is so beautiful. Did you design it entirely yourself or did you get help and if so who helped? Also mind if I ask what your inspirations were?


Thanks! I do all the design stuff myself. I wouldn't call myself a designer, but I do enjoy it either way!


Just one thing—please increase the font size. `--font-size: 14px` is just too small for comfort. The standard default of 16px is a good balance. (`--font-size-small` also naturally needs to increase.) Other than that, it’s a pleasant minimal design.


I think it's safe to start calling yourself that... love it and will definitely be using it as inspiration.


I've been thinking about this in a different way: what about bundling things with rollup.js? Then each package in require() would be just a package.json and an index.js. I think this might even help with performance.

Edit: lightened up (;


Symptomatic of the npm community’s overloading of “npm install” to serve both users and contributors. It’s easy to configure “npm publish” to do essentially this for all users, but it’s considered bad practice. Which I don’t understand, since the dev use case is still fulfilled - and better - by “git clone”. And merging the two doesn’t scale - Chrome’s dev download is several GB and takes hours to build.


We've started publishing modules with a whitelist in the npmignore to help combat this.

    *
    !dist/**/*


Nice job TJ. This definitely scratches an itch. Thought you were a Go dev these days. ;)


Sarcasm?


https://medium.com/@tjholowaychuk/farewell-node-js-4ba9e7f3e...

Update: I see now it's written in go... :-P



Usually I don't care about what language an executable is written in. After all, as a user I'm just interested in whether it executes or not.

But these small-software situations amaze me. Someone with a node_modules problem will have readily available sh, node and maybe python. So why golang? What could those not do, or golang can do better to such extent that it trumps availability? Similarly there's a price to pay in terms of people contributing with a fix: who is interested in pruning node_modules and will send in a golang PR?

In other words, if a dev would prefer Java (specifically chosen because it exacerbates the startup time), would it still pass as ok? Luckily golang can compile to binaries but that implies you give up availability on the other end, now being confined to someone compiling and publishing on a regular basis, as opposed to just pushing a fix commit to a git url.

None of the above would be of importance if this would be a personal-quality repo with a note: hey i did this at 2am out of frustration, i chose the tools that i knew best, use it at your own risk, opensourced to share knowledge and to access it from my own projects, not as a "productified" software.

EDIT: I would much prefer a commenter's solution in sh for the reasons above but also readibility: https://gist.github.com/gpittarelli/64d1e9b7c1a4af762ec467b1... :clap:


Man it's free code, you're reading into this far too much.


I'm trying really hard to see what price has to do with anything I've said.


Well, that's your problem right there.

This line particularly reveals your very sad flavor of entitlement:

> hey i did this at 2am out of frustration, i chose the tools that i knew best, use it at your own risk, opensourced to share knowledge and to access it from my own projects, not as a "productified" software.


This is completely useless because npm still downloads those files and the hdd space is pretty negligeble. It should def go under the category of troll driven development. This might be useful for docker images that will be distrubuted and will be downloaded a lot, but even there it's a stretch.

That being said, it would be great if npm had some functionality around packaging only the necessary files for actually running the module and removing all unnecessary files (tests, source code, documentation) and have an opt-in option to install those.


It's for Lambda


Maybe documentation should include uses cases? Newer users to npm + node might think this does something different. Just a suggestions.


Maybe documentation for everything should contain use cases, first, before anything.

Every Javascript tool has documentation the wrong way round. Quick start guides and installation instructions are useless if you don't even have a good reason to use the tool in the first place.


Dunno, if you need to be convinced to use a tool, then maybe you aren't yet in the market for said tool. I think it's a cherry on top to describe use-cases but not so critical that there's some sort of ecosystem problem that you describe.


The purpose is to un-convince people from using it. Way too often people have already made up their mind by the time they reach the docs, based purely on how popular the tool is, how flashy the front page is, or whatever bad advice they've received from someone who feels the need to justify their library of choice.


It's good to have critical thinking skills, if people can't figure out if they want to use a tool or not, they should probably work on that a bit!


For sure! Doesn't mean we shouldn't strive to give them a helping hand though. I know I fell victim to the exact same problem when I was learning front-end dev. I'm sure everyone in that space has had a "why on earth have I been using this for the past 6 months?" moment for one library or another. I just feel like we should be taking more preventative measures against that.


I guess there is the "files" and ".npmignore" which handle these, but they don't seem to be widely used in the community.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: