This looks nice! I especially appreciate the example images with simple code.
This looks like it could be used to re-implement a lot ImageMagick fairly easily. I love ImageMagick (well, except for hijacking really generic command names like 'convert' and 'identify'), but re-implementing those features in a memory managed language like Go is a real security bonus for servers processing images from untrusted sources. See ImageTragick.
Another major benefit would be to dump having an ImageMagick dependency, and just deploy with a native Go library. Right now I have to use convert on some files (PNG) because I've experienced issues with the native Go libs.
Memory management for a one-and-done image conversion isn't much of an issue. The program will pretty much allocate, run a few very fast GCs since almost all memory usage is in a handful of big arrays (some people seem to have this idea that all GCs are multi-millisecond ordeals, which isn't the case), and then at program termination just throw everything away. A program like this could get away with an allocator implementation that allocated normally but whose "free" is a no-op [1].
What would slow something like this down is not using the relevant assembly code to accelerate working with these images. A very quick glance at the convolution package, chosen on the grounds that if anything is ASM it probably is, shows that this really is "pure Go", in the sense that it doesn't use Go's ASM support. The compiler will not optimize that to speak of, certainly there's no autovectorization, so it's going to be significantly slower than an optimized program that does make correct use of vectorized operations.
Though, in theory, this can be cleaned up over time. Go does have the support built-in for fixing that in a reasonably nice way, as long as you don't mind that it has its own assembler dialect.
Less that the anti-GC crowd thinks, specially when the right algorithms get used, one does not allocate like crazy, and just like in Rust pays attention to where data gets allocated and how.
Last time I looked, the `image` crate was the defacto standard for doing image manipulation in Rust, and in my opinion its API is/was pretty bad. (Had several different types for "an image," which made things annoying, and I recall hitting unreachable!() in a public function once).
When processing millions of pixels, and deciding to tradeoff handwritten vector assembly for nice Go, it's valuable to know exactly how much performance we are losing in return.
maybe a good start, seems this compares rotation for 90 degree angles and resizes of natural multiples (2,3,4..) if so that's not enough to consider for production consideration.
@amzans any reason you’re doing a show HN now? Been following this for a while, and just noticed a new release... is there something in it that makes you feel it’s ready for prime time?
Question related to this topic, about video encoding: We are spending quite a bit of money on Amazon ElasticTranscoder for video encoding.
Wondering if anyone had experience or advices to selfhost that kind of service? Any project I should consider for a proof of concept?
This article outlines how to run FFmpeg on AWS Lambda:
https://intoli.com/blog/transcoding-on-aws-lambda/
With favorable cost comparisons to Elastic Transcoder. Not self-hosted, but looks like significant cost improvement. Also, one could set up a 'transcode farm' with virtual/real machines using FFmpeg running parallel with some light scripting for automation. Looking up 'render farm' might yield some ideas for distributed image computation. Just some ideas...
I ended up building a transcoder within the past few months using Lambda that is handling, as of today, 500k+ videos per month and growing. This was replacing a set of EC2 instances similar to what you described.
A few caveats I ran into:
1) You are limited in how much /tmp on a Lambda instance can hold (512MB total). For the transcode I'm doing on these boxes, some of the videos are that size just to download. It will fail. I have a backup using the older method to handle these very large instances.
2) These, obviously, need a pretty decent amount of RAM to run.
But the benefits are really worth it for us. Elastic Transcoder, for our needs, was going to be >$15k/mo). Our current transcoder cost is around $400/mo. The previous, EC2-based iteration, was a step up for us, but it sometimes took a while to start. Doing it with Lambda has actually been less expensive, provided a faster experience for our customers, and was significantly easier to build than the EC2-based option.
Not bad for a project to see if I could write something in Go.
Couldn’t you attach an EBS or EFS volume and use that for temporary storage?
Sure, even if it’s on SSD, it’s still likely to be more expensive than local storage on the lambda instance, but that might still be better than using ec2.
If your tasks are interruptible and can occasionally fail, this possible, however you have to be very careful with the capital outlays - being stuck with old, unsupported hardware is not good, nor is the human effort required to maintain such a system very cheap. You would be effectively running a data centre that has a particularly good tolerance for occasional downtime, but it still would be a data centre to operate nevertheless.
This looks like it could be used to re-implement a lot ImageMagick fairly easily. I love ImageMagick (well, except for hijacking really generic command names like 'convert' and 'identify'), but re-implementing those features in a memory managed language like Go is a real security bonus for servers processing images from untrusted sources. See ImageTragick.