Hacker News new | past | comments | ask | show | jobs | submit login
Hugo and IPFS: how this blog works (and scales to serve 5k% spikes instantly) (withblue.ink)
100 points by mscasts on Sept 9, 2019 | hide | past | favorite | 36 comments



The basis of this post isn't too sound:

* The "5k% spikes" being discussed here are on a tiny base load. His peak is 6K page views in a day. That's 4 pages a MINUTE. A raspberry pi could handle that. Anything can handle that. There's no load here.

* It's not clear what IPFS is getting him, but it's certainly not performance. Any performance improvements are coming from using the Cloudflare caching proxy, which is basically just using Cloudflare as a CDN. You can get that with a normal website behind Cloudflare with a lot less hassle.

It's fine to check out the implementation details for interest's sake, but let's just be clear that it all could have been done with a $5 VPS.


> It's fine to check out the implementation details for interest's sake, but let's just be clear that it all could have been done with a $5 VPS.

If you're just hosting a static site, it can be done for free. You can use Github pages (or any similar free static site host) + Cloudflare for free. Getting Slackdotted, HNed, or reddit hugged will all survive just fine, for free.

So to recap:

* Use a static site, maybe with a static site generator like Hugo

* Use a static site host, which could be github pages (free), s3 (cheap), or a $5 VPS anywhere running apache or nginx

* Use a free CDN, likely Cloudflare, to handle the brunt of the bandwidth.

The IPFS is interesting but way more work currently than any of this is worth. And if people are using IPFS compatible browsers, the load is being transferred to the IPFS servers anyway, which is worse than just using a regular hosting method and Cloudflare CDN. If IPFS actually had decent adoption, the method described would fall to a crippling internet hug.

The goal of IPFS is to distribute the load (for people with IPFS browsers), and it's possible the article's loadout would do that, but given he's seeing 160GB of transfer from his servers, it certainly doesn't seem to be doing that very effectively currently. And this setup does not in any way demonstrate IPFS is ready to do that, because Cloudflare is saving his servers' butts.


In IPFS "the servers" aren't restricted to the original content hosts / large-scale CDNs / a single corp, so saying "the load is being transferred to the IPFS servers" doesn't take into account the fact that you, your roommate, your neighbor, or 6K other people on the internet could be storing/serving this same data, helping compensate for increased load when accessing/re-accessing a spiking piece of content. There's still a ton of room for performance improvement here - but where this will really shine is not where Cloudflare and existing tools do great, but in the edge cases where a more resilient solution matters (be that for offline caching, local-first collaboration, or censorship resistance).


Sure, it is possible that IPFS would spread out the usage across the other users, and this could still work for people bypassing Cloudflare. But there's no demonstration here of that really working. Just that Cloudflare is really great at what it does.

And yes, it's a chicken/egg scenario -- how do you prove that IPFS adoption would negate the need for CDNs, without widespread IPFS adoption? But currently this isn't some amazing Slashdot proof solution. They needed to setup 3 servers and a whole bunch of software when a single server + free CDN would've accomplished the same thing, faster and cheaper. It's presented as though CDN+IPFS is why this works so great, when really it's just CDN and IPFS just makes it more complicated.

If the goal is to learn, try out new tech, break dependence on CDNs, then this is great, but a single note in the article that this is a learning opportunity and it can be done easier without IPFS would've saved the author a lot of back and forth in the comments here.


Valid point - I'm a huge consumer of posts where people experiment with new solutions and share it back with the community (like the author's other blog post on how to set up static website hosting with IPFS: http://127.0.0.1:8080/ipns/withblue.ink/2019/03/20/hugo-and-... ), so I think we should generally encourage/support this sort of experimentation and documentation. But I agree the timing here is likely coincidence - the IPFS component is likely more for learning/experimentation/resiliency than specifically for the described performance gains. (Unless a large portion of users are browsing with IPFS Companion: https://chrome.google.com/webstore/detail/ipfs-companion/nib... | https://addons.mozilla.org/en-US/firefox/addon/ipfs-companio...)


Just a little nitpick:

* That's 4 pages a MINUTE.

That's assuming that those 6k views were distributed evenly throughout the day. The article mentions the Hacker News / Reddit effect, so I'm assuming there was, indeed, a sizable increase in traffic for a short duration that a raspberry pi wouldn't be able to handle.

I do wish it showed precisely what happened on that 6k view day, however


Even if he got 6k views in the course of 30 minutes, I still think a raspberry pi running nginx and serving static pages could handle. It would have trouble if everyone open the link within 5s or so, but I doubt that's what happened. I think we'd had seen that graph if that was the case.


If you take a step back and look it from the context that the data is somewhat being served by untrusted distributed devices in a verifiable way; it’s pretty freaking amazing it works at all!

Sure a raspberry pie could replace this in heartbeat but it’s not about current throughput being good, it’s about making it work as intended, ensuring the design is theoretically scalable, and marching toward a fully distributed network.

I realize the author is talking mostly about performance but the think that is exciting about ipfs is it’s not some monolithic project trying to take over the world but rather a modular set of standards (IPLD, MULTICODE, libp2p) that generally do a very good job of integrating with legacy and have reasonable future proofed design.

Ipfs is just one application bring all these thing together is in a particular confirmation but many more interoperable variations are likely soon to come.


Not sure if this post showed that, though. I think it can be safely assumed that people coming from hacker news or reddit are not likely to be using ipfs clients. Therefore, this didn't work any different than having a server behind a CDN.


The question is, is CloudFlare caching IPFS or not?

If CloudFlare is running IPFS naked, then IPFS seems to be scaling.

If CloudFlare is caching IPFS, then it is CloudFlare saving their butts.

Can we confirm which it is?

I'm trying to chat with CloudFlare about my own P2P protocol, which I've seen handle HackerNoon's 15M monthly users (I saw about 10K concurrent users per second at peak load), to run GUN ( https://github.com/amark/gun ).

Because I do think it is important to test all these protocols out at bigger/larger scales, that certainly makes it easier to then debug and then fix problems (assuming CF isn't caching).


> If CloudFlare is running IPFS naked, then IPFS seems to be scaling.

> If CloudFlare is caching IPFS, then it is CloudFlare saving their butts.

They are caching IPFS for people who don't support IPFS. They are not caching IPFS for people who do support IPFS, and instead passing the user through to the IPFS servers.

Which means Cloudflare is indeed saving their butts, because as soon as IPFS has widespread support, this falls flat on its face. Same as traditional hosting would without a CDN. Their IPFS setup may be able to distribute the load enough to avoid an internet hug crippling it (that's part of IPFS's point), but this setup in no way demonstrates that.


Yes, it's the CloudFlare caching that is providing the site to the majority of people. For the IPFS version, people need to have a plugin installed or have the IPFS software running on their machines.


Datapoint: I installed the FF extension mentioned in the article, and the performance of /ipns/withblue.ink was inferior to what cloudflare provided.


For perspective, I was seeing 50-200 pages/second sustained for PHP + MySQL on a single modest Linux server 20 years ago. We had clients get well over 6k hits within minutes of an announcement (IPO details launching at market open) and had plenty of headroom.


So the way I think of IPFS (disclosure: haven't deepdived) is something like the world we could have if we had a broadly supported CDN protocol for the Internet (this seems really overdue).

This very naturally answers the repeated questions from people about the why of IPFS.

Yes you COULD serve this via 1 or 5 of the other 20-30 incredibly individually complex stacks with CDN functionality as well. But why would you if there was a common CDN protocol?

What would the upsides be if your ISP could offer you an opt in localish IPFS instance?

IPFS is a functional expression of the long term academic research into creating an internet oriented around content blocks vs server socket oriented as it is today -

https://en.wikipedia.org/wiki/Content_centric_networking

We, of course, need both so its not an either or.

We are backing our way into this world via javascript SRI -

https://en.wikipedia.org/wiki/Subresource_Integrity

Look at that, we need blocks of code that we know is what we want but we want flexibility in where it comes from.


I agree this is an interesting research area and it’s likely to stay one for awhile until the underlying economic blockers are well solved: storage, bandwidth, and support all cost money.

If you host it yourself you have easy answers for that but if you’re relying on generous strangers you are going to hit limits: how many people are going to mirror large amounts of content, deal with DMCA takedowns and other legal requirements, etc. before giving up? Performance could in theory be competitive but there’s no guarantee and it’s notoriously hard to deal with unreliable nodes — which again can be dealt with if you have skilled support staff available. You’ll get some volunteers for the right cause but that only gets you so far – that’s why there’s a a couple decades of these projects dreaming big and failing once they get popular.


I've handled being Slashdotted back when being Slashdotted was still a thing.

Yeah, thanks, I'm ancient. Before I fart myself to sleep reminiscing the good old days, we got ~10k visits in a hour, on a ASP.NET website (admittedly with good caching) but on the absolute dirt-cheapest of shared hosting. It didn't skip a beat. I've done similar with PHP and Django. As long as you can avoid hitting the database every request, you can scale to the moon with very little work.

Hugo is static. Your toaster should be able to support 5k users.


> one issue I’ve experienced with running an IPFS node is that it can use quite a bit of bandwidth, just for making the network work (not even for serving your content!). This has been greatly mitigated with IPFS 0.4.19, but my Azure VMs are still measuring around 160GB/month of outbound traffic (it was over 400 GB with IPFS 0.4.18).

I'm not sure if that's 160gb/vm or 160gb across all vms - but either way that's 10-30 usd/month in bandwidth?

Pretty steep compared to a (single) 5usd/month vm or free (free tier of cdn provider).

Interesing write-up, though. Nice to see cf support ipfs.


Definitely check out the latest version: https://blog.ipfs.io/054-go-ipfs-0.4.22 - significant bandwidth improvements since 0.4.19 and more coming in the next minor release too!


I wonder why EC2 bandwidth costs so much more than s3 bandwidth (and the same equivalent questions for Azure).


Good on the author for an excellent write-up but I really don't see the point of this except as a learning exercise.

Instead of running a single static server, they now have a cluster of three servers plus Cloudflare to serve up a few static files to a fairly small number of people. And even then they have problems with caching and IPFS-related bandwidth. And almost nobody actually accesses the site via IPFS, making the whole thing rather pointless.


What proportion of the load was just Cloudflare CDN being a CDN and loading from cache?

This article seems to be mostly about the benefits of IPFS, but I can only see the benefits of Cloudflare caching. What am I missing?


Where is the analysis on the percentage of traffic served via IPFS to back up any claims of the benefits of IPFS.

It appears to me that Cloudflare accepts IPFS as (another) source for it's CDN operations & any benefits of flat CPU utilization are likely from the CDN and not IPFS.

In fact, 3 VMs in 3 regions is complete overkill. If the traffic numbers are to be believed, one could simply post their Hugo site to GitHub Pages or Netlify with zero extra steps or dollars spent. No IPFS needed


The stupid-simplest solution I ever saw for this was all the way back in the Slashdot era, where people would filter on refer headers and redirect all traffic from news sites to the Google cache version of their page.

That works great for static pages. Hugo and Jekyll are basically precompiling static pages.

For a more dynamic site? I know some people who do special handling for bot and spider traffic. The bots get not exactly static content but much less dynamic content. I could see rerouting all traffic, especially for everyone without session cookies, to that version during a big spike.

Those solutions behave a little bit like the eventual consistency you see on very large websites, where values are approximated or cached with a very short TTL.

As others have commented, the simplest way to get that on a small site is to pony up money for a CDN. Maybe not the cheapest, but certainly the simplest.


genius. why a focus on refer headers from news sites instead of using some content identifier to indicate whether to route to a more static or dynamic page?


Nice write up and includes a lot of links with technical details on how to deploy your own IPFS node(s).

I agree that the benefit of this implementation (3x IPFS + Cloudflare) over the dedicated VM/VPS might not be obvious for the amount of traffic/visits this specific blog is having, but it's a good alternative to know :-)

On a side note... Wouldn't it be great if OpenStreetMap/Google maps/your_favorite_map_provider was hosted on or via IPFS and there was an easy interface between the IPFS network and the www? This way the distribution of the tiles would be peerless, the CDN would be huge and each user would have locally the tiles most frequently used and serve them at the same time. No more dependency on big companies/providers, immune to DDoS/blocking/restrictions and free for all ;-)


I really appreciate the work of getting this up and running; it's certainly a good exercise in making a fully-distributed content site.

What I'm missing is where a collection of static files couldn't just be served up from an S3 bucket and Cloudfront or Cloudflare on the front end -- you arguably have the same caching performance if not better since Amazon and Cloudflare have real SLAs for getting your bits from your bucket to a user's browser.

IPFS seems like un-needed complexity when there are a huge amount of options available. If you personally don't like Amazon, you can use Github pages, or DigitalOcean, or Netlify, or Fastly, or the list goes on and on.

Does anyone have a use for IPFS that isn't already covered by existing hosting+cdn options?


You can have both. Cloudflare has an IPFS gateway that you can use with a Cloudflare domain. Content hosted in IPFS and CDN-ed via Cloudflare's network. An IPFS-capable client will bypass Cloudflare via the dnslink entry.

See https://developers.cloudflare.com/distributed-web/ipfs-gatew...


Except then you still have to cover the demand from requests which bypass Cloudflare through the dnslink entry.

Essentially it's a gimmicky system which is propped up by using Cloudflare, which ultimately fails once it actually gets user adoption, in a way the tradition setup (CDN + basic or even free hosting) would not. This only works because most users don't have IPFS support currently, and if they did the servers would grind to a halt just as they would with no CDN at all.


IIUC the theory is that the clients using native IPFS will be caching the content for a period of time and serving it to other IPFS clients. Therefore the load should be shared across viewers leading to a minimal increase on "origin" load.


Yes, theoretically that's the goal. Given they're showing 160GB of transfer on their servers, it doesn't seem like that's working particularly effectively. Regardless, this implementation does not demonstrate at all that IPFS would save a site from an internet hug -- just that Cloudflare can.


The JavaScript node is coming along surprisingly well. Still not 100% production ready but light enough you could doom it in the background of pages without affecting the user experience.

The addressing mechanism the sun out into IPLD is quite clever and IPFS is nearing the cusp of having a noticeable impact on the web when you consider your browser is potentially a node running in the background.


I think the main point is if Cloudflare goes down or decides they (or the government) don't like the content hosted on this site, it will still theoretically continue to work over IPFS. Probably not important for this particular blog, but it's a valuable resource for helping other people with similar setups.


My needs are pretty undemanding, but Netlify has been great for me. It’s hard to beat for both plain old HTML/JS/CSS pages as well as for generators like Hugo.


I love Hugo and direction it has been heading and always appreciate notes about how easy it is to install and how fast it is (mainly because it is written in Go).

I have been struggling to get adoption with students working on web sites at the university I work at but when we get it up and working it is so great for 90% of the things we do.


IPFS looks great, but the whole article smells like (not that)subtle Cloudflare advertising.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: