Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Uploadcare Built a Stack That Handles 350M File API Requests per Day (stackshare.io)
135 points by awartani on Aug 28, 2017 | hide | past | favorite | 40 comments


User generated content (especially images) are a great attack vector, what do you do to isolate/mitigate against attacks like that ?.


How so? Obviously they won’t be executing (or likely even analysing) any of the uploaded content.

Similarly, browsers should not generally be particularly vulnerable to malicious content being loaded with appropriate MIME types in appropriate containers (e.g. <img>)

It sounds like you should be asking how browsers protect users from malicious content. Perhaps you could elaborate?


You can perform a denial of service attack on a naive server with a maliciously crafted PNG. Just send a zip bomb and see what happens when it decompressed it. The naive approach will crash the server when it tries to malloc successively larger buffers.

https://www.bamsoftware.com/hacks/deflate.html


They say right in the article that they're doing image resizes on the server, for instance. With a customized library too...Hope it's sandboxed well!


Image and Video codecs come under attack quite often see https://blog.sucuri.net/2016/05/imagemagick-remote-command-e...

In this context the image manipulation they do with pillow and the underlying libjpeg would be a potential source of vulnerabilities.


Significantly, it's not just libjpeg but every format supported by Pillow (http://pillow.readthedocs.io/en/3.4.x/handbook/image-file-fo...) — many of those vulnerabilities have historically been in obscure formats where the implementation has had far less attention than the mainline JPEG or PNG support.


Yep, I remember multiple smaller art-centric sites getting hit in a wave by an ImageMagick RCE vulnerability. Database dumps, full source leaks, the works. Unsure whether it was the one you linked; it seems more recent than I thought.



I find it interesting that these sorts of stacks have tons of moving parts. Maybe it's the nature of highly scalable systems? Or does it come from starting with one particular technology and then having to drag in lots of other things to make it work?


In the article we tried to convey the main idea behind that — take the best tool for the job at hand. There's no "one size fits all" framework or product to put you money on. It's much easier to handle this zoo than making something do that it's not supposed to.

Furthermore, to get high scalability, you have to make things as loosely coupled as possible. This means you're up to making some choices.

Hope that makes sense and answers the question :)


It wouldn't need many moving parts if you didn't want to mess with the data. Like if you just functioned essentially as a blob store. But as soon as you start touching data formats... things get fun and you should prepare to be a hacker playground.


That's a great read. I've always been interested in learning how such tech-oriented companies found their initial traction. Are there any blog posts / articles / podcasts about Uploadcare's early days and the search for the product/market fit?


I just got wind of it, we at Uploadcare will soon be releasing an article with more info about the early days :) And, I believe, a podcast or two. Thanks for this question, btw. Would you elaborate on what you would like to know? It'll help us compile a great article, thanks :)


Great to hear that!

The reason why it's particularly interesting to me, i.e. to someone with a dev background, is that the lean startup wisdom says you should be very specific about the customer you're after and Uploadcare seems like a solution targetted at a broad spectrum of customer segments. Of course, I'm happy to be proved wrong if there are one or two dominant customer segments that you address Uploadcare to. Also, you might have as well started out with a very specific customer persona and spread to other segments. Whatever it was, curious to know.

I guess many developers dream up products targetted at developers like them selves. Selling to fellow developers is hard. It would be great to read a success story for a change.


Short answer is:

- believing in your own service - perseverance - fanatical customer support

This helped us to stay focused and our happy customers brought us new happy customers. We didn't know much about marketing in the beginning.


I wonder what's the breakdown between unique files delivered as opposed to files delivered from the CDN cache. Also, what's the breakdown for file uploads, manipulation and delivery? The 350M API requests per day would make more sense if we get this brakdown


Cached/uncached file delivery is close to the universal 80/20 ratio. Cached operations are not included in that number.

Unfortunately, I can't say anything more than that.


Curious - does that mean you serve close to 1.75 billion requests per day, out of which 350M are unique requests that exercise your stack instead of being served from a CDN. It'd be interesting to know more about what's the number of transformations you do at peak, if you can talk about it.


350M/day = just about 4K QPS. Is that considered impressive nowadays?


It's important to define what the "Q" is.

4K QPS where Q = file uploads -> Definitely.

4K QPS where Q = HEAD request? -> Not so much.


Assuming most transactions are a largish file transfer it seems impressive. And I assume the transactions aren't evenly spaced, so the peak is likely much higher.

4k qps of DNS, for example, would be less interesting.


In the case of large responses, bandwidth out is a much more interesting metric. I'm sure their number sounds more impressive though.


If it's a natural distribution the peak QPS will not be 4k QPS.


Impressive or not I just wish people would stop using monthly averages in the headline like this. You can't really make the case that this in-depth stack dive is for a layman audience, so you have to know what a meaningless metric it is.


I totally agree, but this was one of the requirements of the editor to have a "good marketable" headline. And we have to admit that this worked quite well.

On the other hand I feel that the article is very light and is indeed more for layman audience :) In depth one would be 10-15 times longer (and 100 times harder to write).


The QPS is just a number, that doesn't say much. Impressive or not, it's still interesting to read stories like these. My nit pick is that they, like many others are only using one cloud provider, don't put all your eggs in the same basket.


It looks like your certificate expired.

Since it's just a DV certificate from Comodo, have you considered switching to Let's Encrypt? Its automated systems could have helped you automatically update.


Yeah, totally on me for letting this expire. Sorry everyone :/ Lesson learned. We may switch to Let's Encrypt once they add wildcard support, which seems to be next year.


It's a wildcard certificate, which isn't supported by Let's Encrypt yet.


Site appears to be hosted in AWS. Lots of free SSL goodness to be had from Amazon as well.


It's stretching into off-topic land here, but could you suggest any AWS SSL resources that are worth investigating?


AWS has Certificate Manager which provisions free certificates and manages renewals automatically. Usable across ELB, Cloudfront, etc.

https://aws.amazon.com/certificate-manager/


It's Heroku for Stackshare: https://stackshare.io/stackshare/stackshare

Uploadcare does not have issues with certificates and we're indeed going to switch to ACM for some of the endpoints.


By using AWS.


It's not that hard to get high numbers with AWS, indeed :p

The hard thing is to make it cost effective. To that end I can proudly say that AWS bill is not in the top list of Uploadcare expenses.


AWS is an infrastructure. It could be Google Cloud or something else. When writing the article, we wanted to convey how frameworks are interconnected and try and estimate the number of "Moving Parts" (I liked this one) :)


All I can say is they don't seem to donate Django project.


Uploadcare contributes to open source, however. For instance, the fast and production-ready Pillow-SIMD fork is a great example. You can search for "Pillow-SIMD" here for the discussion or check out the original article: https://blog.uploadcare.com/the-fastest-production-ready-ima...


If that's all you can say, don't say it at all.


Oh the irony.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: