Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the logic is, if we didn't rely on Containers and prebuilt VM's, Hadoop had to be easier to build to be useful.



The point everyone seems to be missing, and the one I think most important, is that we're no longer building from trusted sources.

Build systems just download and run random code from the internet without verifying that its the correct code, from the correct source.

Its a ticking time bomb.


There is SSL/TLS, unless it's done wrong (invalid certificates get ignored by the dependencies manager), it's safer than the old "md5 of the file" systems.

Now, some dependencies are fraudolent (especially true in the Javascript world because it eventually targets a lot of user browsers), but nobody ever checked the sources anyway...


TLS only verifies that have connected to the correct server. It can't verify whether the package on the server has been replaced by a malicious one. For that, you need a "md5 of the file" (these days, a sha256, because md5 has long been broken).


You need to make sure the hash is also not tampered, both on server and in flight to the user. How do you do that?

If the answer is: use TLS, there is no point in having the file hash at all.


No, the answer is to use PGP and a manifest hash.

This is how package managers work. TLS doesn't replace those.


Which isn't really true; as a sysadmin (I'd say "former", but once you're a sysadmin, you're always a sysadmin), I've seen lots of things with horrible build and dependency nightmares, and that was before package managers, containers, and virtual machine images became de rigeuer.

Think of a self-hosting programming language: you can't build it without a running installation of a previous version. (Anyone remembering "On Trusting Trust" at this point?) Or any application in an image-based language like Smalltalk. Development becomes path-dependent. It's inevitable to get into a situation where A and B cannot be made to work together, except in a derivative of a version that someone, somewhere made while holding their mouth the right way.

Pre-built containers and VMs are an admission that path-dependence is the way stuff is supposed to be.


I think that is the author's logic. Except it's not very logic, since Hadoop (or Bigtop) doesn't use either.


Picture this: you need to use Hadoop. Do you:

A) work through building it yourself, or

B) get a container that claims to have a running Hadoop and hope it works for you?

If B wasn't on the table, what would happen?


If I need to use Hadoop, I'll download one of the pre-built binaries that they offer on their site.

You'll notice that the Debian Wiki users have given up on building it since 2010. That was three years before Docker even appeared. Almost nobody was using containers back then.


Hadoop isn't even that difficult to set up. I've built it from source, and installed it from binaries.

Containers are totally unnecessary here, just as they are for most java apps.


That's like saying, "If we didn't invent the internet, we would have never had privacy issues". OK, so if we didn't rely on containers - would hadoop have had a perfect set of packages for every distribution? Let's say that the packages for Arch linux were broken. What next?

That's the whole problem with the article. It takes a problem (building Hadoop was bad), correlates it to a completely different tool (because we have docker, hadoop build scripts are bad), and goes on to rant about everything else.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: