Hacker News new | past | comments | ask | show | jobs | submit login
Decode Like It's 1999: MPEG-1 Decoder in JavaScript (phoboslab.org)
211 points by ronsor on Feb 28, 2018 | hide | past | favorite | 62 comments



Regardless of language, writing a decoder for a standard image or video format is a highly recommended exercise to develop your skills at implementating a specification (and optimisation, if you choose.) You don't have to jump into the deep end with H.264 or similar, and even MPEG-1 is beyond what I'd recommend for a beginner; something like H.261 would be good to start with, with MPEG-1 being the next step up.

The fun thing about experimenting with video codecs is that your results are visible in a very real way.


Along similar lines, writing a simple emulator (e.g. Chip-8) or a 3D model loader/renderer/animator (e.g. for md2 or md3 files, which have an open spec and lots of models available) would be similarly good I think, depending on your interest. In any case, you still get the implementing a spec practice and very visible results.


Compilers or interpreters are also great. In my experience the knowledge directly translates to everyday programming tasks.


Can definitely recommend this!

A few years ago I wrote a JPEG encoder for a class, and it was one of the most fun projects I've had at the university. Seeing a picture with those iconic compression artifacts produced by your own implementation feels pretty nice :)


"And suddenly you spent two hours of your life and downloaded several GB of tools. All to build a 20kb library, for a language that doesn't even need compiling. How do I build this library 2 years from now? 5 years?"

:D


But the alternative the author suggests is to use UglifyJS - which you also need "node, homebrew and gigabytes of tools" for.


Author addressed this by saying that you can also just replace it with "cat" and it will all still work


Or just 20 seconds copypasting on one of the numerous free web services like uglifyjs.net


This is not under any realistic or sane conditions a solution.


In particular, worrying about not being able to reproduce something and then proposing to use a web app for it seems... counterproductive.


[flagged]


My experiences with node include being extremely difficult to install on centos because a dependency was removed from the EPEL repo and random npm packages curling things from random git repos and other sites as part of their install process

gross


I do always use Centos. I'll be honest I never tried to install it via package manager (because chances are its version would lag behind the official release).

You can use nvm (https://github.com/creationix/nvm) to install only for one user (inside $HOME) or manual install from release page for all users (cp -r */ /usr when inside the unpacked dir): https://nodejs.org/en/download/

It will work flawlessly both ways without using any package managers for your system. What is so hard about that for people?


Sometimes you want to run something written in NodeJS as a service user which naturally and usually does not have a home folder at all.

Additionally, having to sync up the user installation and service installation if you then create the home folder is a source for bugs both in service and user that can't be crossreplicated between the two.

A global installation is desirable when multiple users are required to operate on the exact same versions of a service.


> Sometimes you want to run something written in NodeJS as a service user which naturally and usually does not have a home folder at all.

The idea of using an isolated user and a shared filesystem makes no sense; it's the security model you come up with due to random accidents of unix history, not one that you'd design from the ground up. If you want your NodeJS service isolated, do it properly: use a jail or container, which work beautifully with npm (and are much more awkward to do with RPM).

> Additionally, having to sync up the user installation and service installation if you then create the home folder is a source for bugs both in service and user that can't be crossreplicated between the two.

On the contrary, it forces good practice by ensuring that you always know how to install the set of dependencies that you need; you use the same source of truth to install the dependencies as both. When you rely on system installs of packages, you create exactly the same problem when you come to run the program on multiple servers, only less fail-fast and harder to diagnose.

> A global installation is desirable when multiple users are required to operate on the exact same versions of a service.

If you get in a position where that's you're requirement, you've done something wrong. Take a step back and figure out what you really need.


>The idea of using an isolated user and a shared filesystem makes no sense;

I do run my stuff in containers, I have about 30 of them setup, each one with it's own service.

However, I still have to login into these containers, which happens as root (for the container). And then having to pivot to the app user is bothersome. Luckily or strangely a lot of applications don't work as root because nobody should be running any service as root.

>On the contrary, it forces good practice by ensuring that you always know how to install the set of dependencies that you need;

I have automated deployment for that. I simply add a python script to my fabric repository and signal which container should install it, the rest is fully automated and tested.

User-local installs are complicated here since it means the deployment script will have to pivot to another user temporarily.

>When you rely on system installs of packages, you create exactly the same problem when you come to run the program on multiple servers, only less fail-fast and harder to diagnose.

I also create a reproducable problem, user-local installs are less reproducable, in my experience.

I can replicate the exact environment of an app server installed via apt-get within a few keystrokes (I wrote a fabric task for that; "fabric copy-deployment <sourcecontainer> <targetcontainer>")

>If you get in a position where that's you're requirement, you've done something wrong. Take a step back and figure out what you really need.

You mean like running multiple instances of the same service? That happens. Maybe you want to deploy three seperate instances of AppSomething for three domains with three seperate datasets. Instead of deploying three containers, I deploy one container.


> And then having to pivot to the app user is bothersome. Luckily or strangely a lot of applications don't work as root because nobody should be running any service as root.

Again that's accidents of Unix history rather than sensible security design. In a single-purpose container running as root is fine (https://xkcd.com/1200/) - indeed we could go further in the unikernel direction and just not have multiple users inside the container. In the meantime it's easy enough to paper over the extra user transition in whatever tool you're using to log in.

> I have automated deployment for that. I simply add a python script to my fabric repository and signal which container should install it, the rest is fully automated and tested.

Great, but apt doesn't really guide you in the direction of doing that, whereas when using a language package manager in the same role automating it tends to be the natural default. http://www.haskellforall.com/2016/04/worst-practices-should-...

> I also create a reproducable problem, user-local installs are less reproducable, in my experience.

> I can replicate the exact environment of an app server installed via apt-get within a few keystrokes (I wrote a fabric task for that; "fabric copy-deployment <sourcecontainer> <targetcontainer>")

Interesting, I've found exactly the opposite (though I mostly work with maven which has always been good at reproducibility, maybe npm is less good).

> You mean like running multiple instances of the same service? That happens. Maybe you want to deploy three seperate instances of AppSomething for three domains with three seperate datasets. Instead of deploying three containers, I deploy one container.

Why? I guess you'll save a little bit of memory and disk space, but you've created a new intermediate level of isolation with weird characteristics that you'll need to keep in mind when debugging - those three instances of AppSomething are now able to interfere with each other a bit, but they're not quite as similar as you'd expect either. Do you really need that much complex granularity in your isolation model?


>Again that's accidents of Unix history rather than sensible security design. In a single-purpose container running as root is fine

In an ideal world, a user would be a container with shared filesystem. There are lots of uses for a shared filesystem (primarly backup tools).

A lot of container tools make it very hard to properly backup and restore containers.

>Interesting, I've found exactly the opposite (though I mostly work with maven which has always been good at reproducibility, maybe npm is less good).

Reproducable here means: I can pull up the same exact server environment, including all versions.

Build systems only do that for the language itself but system dependencies (OpenSSL, curl, etc.) might have different versions and don't help much in fixing that if it causes a bug to appear or disappear.

>those three instances of AppSomething are now able to interfere with each other a bit, but they're not quite as similar as you'd expect either. Do you really need that much complex granularity in your isolation model?

Interference does not happen if you properly isolate the users which is certainly possible on a modern system.

Systemd makes it extremely easy to essentially mount everything but the immediate app data as read-only. No interference.

And yes, I need such complex granularity.


> In an ideal world, a user would be a container with shared filesystem. There are lots of uses for a shared filesystem (primarly backup tools).

I see the shared filesystem as much more of a liability than an asset. It's too easy to have hidden dependencies between seemingly unrelated processes that communicate via the filesystem; when a file that should be there isn't, there's no way to ask why.

> Build systems only do that for the language itself but system dependencies (OpenSSL, curl, etc.) might have different versions and don't help much in fixing that if it causes a bug to appear or disappear.

Having a single dependency manager that can bring up consistent versions of all relevant dependencies is important, agreed, but I think the language dependency managers are closer to having the needed featureset than operating system package managers are. Operating systems are far too slow to update library dependencies and have far too little support for making local or per-user installs of a bunch of packages - of course the ideal system would support both, but I can live without global installs more easily than I can live without local installs. I'm lucky already in that the JVM culture is more isolated from the rest of the system - often the only "native" dependency is the JVM itself, and so using the same versions of all jars (and possibly the JVM) will almost always reproduce an issue. My inclination would be to move further in that direction, integrating support for containers or unikernels into the language build tools so that those tools can build executable images that are completely isolated from the host system.

> Interference does not happen if you properly isolate the users which is certainly possible on a modern system.

> Systemd makes it extremely easy to essentially mount everything but the immediate app data as read-only. No interference.

Sure it's possible, but again it's not the natural path, it's not the way the OS or a lot of the traditional-unix tools you get expect it to be. Things like CPU quotas for users feel very bolted-on.

> And yes, I need such complex granularity.

Why? What does all that extra complexity gain you?


I don't see what you're getting at. But if I understood correctly the answer would be:

You can just download node, put it anywhere and it will work as long as you add './bin' to 'PATH'.

This can be done on per-command basis as you probably know:

PATH=$PATH:/node/location/bin npm -g i leftpad


Suppose I want to run the service leftpad.io

For this purpose I would have a service user called "leftpad-srv" under which my leftpad.io server runs.

When I login, I am "root".

When I want to say, change leftpadding from spaces to tabs, I'd call `leftpad-io-ctl set-padding \t` which would use a socket to communicate with the leftpad.io server.

For this purpose it would be very important that leftpad-io-ctl and the leftpad-io server are the same version, otherwise the -ctl might support setting a rightpad even though the server hasn't implemented this yet because they have two different versions.

A global install is necessary for many deployments.

(This is hypothetical but many apps have special ctl-tools to control or monitor the running application and it can be useful to, for example, have moderators in your app that can access the console with limited permissions)


So what is tricky about this? Global install is cp -r */ /usr inside the downloaded release folder. I'm probably still missing something.


A package manager ensures that the install is correctly available for all users.

With apps, same story, it ensures everyone has the app.

Install via cp -r / /usr is not a good idea as the package manager has no idea you are doing this and won't help you out.

In a worst case, the package manager will trample all over the install.

Additionally, a simple cp -r / /usr will probably not set correct permissions automatically, which means either users can edit the binary or won't be able to execute it.

Lastly, it means any update will have to be installed manually for every single release.

Package managers do this automatically and with much less friction.


> Installing nodejs is one download. After that it's running npm install and everything just works.

Yep. Even wehen it works as exepcted it might become shitshow.

Couple of years ago I tried to install a small library (~50kB or less), and npm install pulled in 900MB of crap. I didn't expected npm to even contain so many packages back then.


That’s why I created npm-download-size recently, to quickly decide the size of a package in total. It has a cli and a web frontend: https://arve0.github.io/npm-download-size/


Then use another library. Is that a fault of npm or node that people developing some libs don't know what they're doing? The point here is that ecosystem allows your package to be one 'npm install' away from the user. The fact that you choose to tangle yourself up from complexity is no one's fault but yours.

(for the record to me 900MB seems like a grossly overestimated number, I would request a link on that)


Is that a fault of npm or node

Nobody said it was.


it was one of async/await precompilers. I do not remember which one, I tried all of them, except this one which I removed right after npm install.


A colleague of mine "wrote" a H265 decoder in Javascript.

I put that in quotes, because it was a byproduct of a much cleverer project: http://www.argondesign.com/products/argon-streams-hevc/ ; he wrote a parser for the "human readable" H265 specification, using the O-Meta system. This supported a number of pluggable backends, one of which emitted a javascript program, and one of which was the actual saleable product of an H265 decoder validation test suite.


Wow that is a very interesting approach!


I cannot find anything under "O-Meta"; can you provide a few references?



> Decode Like It's 1999

Here in the future, both Firefox and Chromium are still decoding MPEG in software, not hardware.[1]

Chromium is actively working to fix that[2]. Mozilla, not so much.[3]

I've been hitting this lately, as webcams use MPEG at high resolutions (to fit USB 2 bandwidth). It's painful for AR.

[1] https://wiki.archlinux.org/index.php/Hardware_video_accelera... [2] https://chromium-review.googlesource.com/c/chromium/src/+/53... [3] https://bugzilla.mozilla.org/buglist.cgi?quicksearch=linux+h...


Good writeup.

This is probably a silly question, but: The article says he has to use performance.now. But isn't one of the mitigations for Spectre and/or Meltdown, to reduce the accurance and/or resolution of performance.now? If so, would that affect his code?


I remember when this came out, because at that time we were in a desperate need to shrink the assets of a mobile slot game we were porting from flash to HTML5 for a customer - we had rougly 50MB of data(mostly animations), so the game failed to load on a 2G connection which apparently was the best thing one could get at the conference where the customer's sales rep was at that time.

We squeezed them to 10MB with minimal quality reduction, but another problem quickly arouse - this was late 2013 and the phones back then could barely handle such load and heated up tremendously.

All in all we negotiated a pretty deep cut in special effects which yielded ~25MB in reductions. A rotten compromise, but a compromise nonetheless.


On a phone, could you even see a difference from reducing quality? I've always intentionally gotten the lower quality ex. youtube streams on mobile because on a 4-5" screen I'll never miss those pixels


Sort of. We had a separate file for each animation that contained the grayscale alpha channel and some artifacts were showing.


Please be careful, this library has hidden GPL encumbrance issues. See my prior comment for details: https://news.ycombinator.com/item?id=13556718



Cool project. Now if I could just find some VCDs I would be in business.


I've actually been using a web assembly port of FFMPEG (https://github.com/Kagami/ffmpeg.js/blob/master/README.md) successfully for a while now in my clojure(script) WIP client-side meme editor for a while https://www.ultime.me/edit For reasonably sized mp4, mov, gif it decodes and encodes at a good speed.


Pretty fantastic post, thanks for sharing. I have a love/hate relationship with JS (which I think is normal?) However, this is one of the love moments. Excellent choice on the sample video too, the Director's Series DVDs have also been my test material for various encoding/decoding adventures.


Oh the shade on webpack and npm. Its brutal but for a 20kb library thats actually too much time to setup. Webpack4 has improved a lot now as it supports zero config mode now. All thats needed is a src folder with an index.js in it. Much welcomed change for smaller projects!


Building an MPEG decoder, video and audio, in JavaScript, over years? Sure!

Using JavaScript build tools? Lol nope!


See also Route9, a VP8/WebM decoder in JavaScript - https://people.xiph.org/~bens/route9/route9.html


Could anyone provide good resources to learn more about audio/video codecs? I know nothing about how they work at a technical level/how to write something like this myself.


If I remember it correctly MPEG-2 patents are supposed to expire this year. So maybe it's time for JSMPEG2? (Ok just kidding, the whole MPEG-2 spec looks huge).


If the Björk video, All is Full of Love, on http://jsmpeg.com site looks sorta secondhand familiar, it was cited as a large influence on the makers of the Westworld title sequence: http://www.artofthetitle.com/title/westworld/


MPEG-1 is more of a 1993-1995 Peak Usenet thing..


Oo-wee-oo, I look just like Buddy Holly...


My stupid robot site uses this! Although I have not had much time to keep any robots on line it can be found at following.

http://robot247.io https://github.com/mbrumlow/webbot


JSMpeg is a great product. I have used it for low-latency live streaming before MSE (Media Source Extensions) arrived in the major browsers. Nowadays the plugin-free alternatives for live streaming are much more efficient, push fragmented MP4s to the browsers and append to a MediaSource buffer using any modern codec supported.


cool project! But one question:

> A bug that still stands in some Browsers (Chrome and Safari) prevents WebGL from using the Uint8ClampedArray directly. Instead, for these browsers we have to create a Uint8Array view for each array for each frame. This operation is pretty fast since nothing needs to be copied, but I'd still like to do without it.

Why worry about it if you're not actually copying the buffer? It's like allocating one extra tiny object per frame, right? It seems like it would be totally insignificant compared to the rest of the workload.


Compare it to a compiler that refuses to compile invocations of methods with optional parameter correctly. Sure, you can just insert the missing arguments with their default values, but you shouldn't have to. It's an inconvenience and annoying, especially for something that should just work.


When you don't captcha your comment submission system...



I would think that building ffmpeg with emscripten to produce WASM would be the way to go.

If the goal is anything but learning that is.


What is the resource usage of the decoder in terms of 1999 hardware versus 2018?


This is great educational content. Thanks!


Does mencoder support encoding of MPEG-1?



ffmpeg does too, with a built-in vcd preset too.


> Why use JSMpeg?

> WebRTC is not supported on iOS.

No longer true.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: