Hacker News new | past | comments | ask | show | jobs | submit login

Indeed. It has always amazed me that so many people treat their static content as something that needs to involve a database.

Look at how many blog entries come through here and have the entire site fall over from a few thousand people going to read them. Somewhere there's a poor slicehost instance performing "select * from blog_entry where key like '10_reasons_rails_is_awesome'" thousands of times in a row until something melts.

Worse, the proposed solution when this happens is to add caching.

No, the thing to do would be to add a .html file to your webserver. I defy anybody to find a modern web server that can't serve a static file a thousand times per second from the smallest VPS slice on the market.

It's a solved problem. But people keep unsolving it.




I defy anybody to find a modern web server that can't serve a static file a thousand times per second from the smallest VPS slice on the market.

Apache 2, with KeepAlive on, using the MTM that will be installed if you have PHP running on your system. Specifically, when you try serving a static file a thousand times per second to a thousand clients, your first 150 clients are going to saturate all available worker processes for 15 seconds, then your next 150 clients are going to saturate all available worker processes for 15 seconds, then ... most of your users will see their browsers timeout and it will appear that the server has crashed.

n.b. This is the default configuration you'll get if you do weird crazy things like "sudo apt-get install apache2 php5" on Ubuntu.

Edit: Added victim site below if you want to test this.

http://50.57.160.126/

[Edit the second: I fail at reading comprehension of my config file at 4:30 AM, and underestimated the number of simultaneous connections it could take at once. Still, I'm pretty certain this is technical reality after the numbers are corrected.]


Regarding your second edit:

If anyone actually tried to run Apache/mpm_prefork with "MaxClients 150" on an average Linode or cheap dedicated server, they'd OOM and start thrashing as soon as someone started requesting a couple dozen PHP pages -- even if the rest of the requests were to static resources. That's just another way to get yourself DoS'd. Received wisdom on the Linode forums is that you should handle no more than 10-15 simultaneous connections if you're stuck with Apache/mpm_prefork and PHP. So your original point stands.


Ah, now I remember why it is set to 24 on my blog's (old) production server, which had 1 GB of RAM. You're right.


I was under the impression that the 15 seconds timeout for php was if a page took longer than that to load, apache would kill it. I thought the worker processes would be available to serve content immediatly after its done serving a page.


No, KeepAlive operates at the HTTP level. It works just the same even if no PHP was involved. PHP timeouts are governed by a different setting.


Oh interesting, why would you want to keep a connection alive for that long? Page assets and other requests?


I disagree that high-performance blogs are a solved problem in distros - depends on who's linking to you.

Wordpress on Apache melts on a small VPS under a few hundred hits per second, using gobs of memory for each call. So you turn on supercache etc. and it gets a little better, for a lot of application complexity.

Now put varnish in front, override some of the cache-ability headers from your application, and my experience is that when e.g. Stephen Fry's twitter links to the site, your site becomes CPU or network-bound instead.

From memory, from maintaining a friend's site, the number of simultaneous connections to melt down the server (using siege, on a 4GB system) were something like 300 connections without any optimisation, double that if Wordpress had spat out .html files and Apache was serving, but with Varnish in front, it started to slow down at around 2000 connections.

There's a reason you want to serve your application from a database - it's nice and easy to change your pages on the fly, but serving static files through Apache is hardly the best you can do for optimisation.


This single comment taught me more than the original article.


Although I take your point that people do a lot of needless computation these days, I don't quite understand the dig at caching. Isn't adding an HTML file to your server just manual caching?


Pretty much. Only chances are it's not actually manual, either.

The way I do this for the CMS/Blog stuff on my products is to set up a 404 handler that looks for urls that seem like they should be blog entries. So if it sees a request for the non-existant http://mysite.com/blog/caching_is_awesome.html, it'll do a quick check for that article in the database, and if necessary create the static .html file before redirecting to it.

It's a little nicer than caching because you only need ever lookup/create the thing once each time it changes. From there, it's just the webserver being a webserver. No need to involve any application layer at all.


> The way I do this for the CMS/Blog stuff on my products is to set up a 404 handler that looks for urls that seem like they should be blog entries. So if it sees a request for the non-existant http://mysite.com/blog/caching_is_awesome.html, it'll do a quick check for that article in the database, and if necessary create the static .html file before redirecting to it.

You have done the webserver equivalent of method_missing trickery in Ruby.


Even a minimal blog written in Rails can cache to .html files with a page caching line in the PostsController and some lines for expiration when new posts are created/updated/destroyed.


Why not go the whole way and only edit/serve up the static file instead of the database entry?

That has the added benefit that you can use version control on it.


Serving static files is more about pre-processing, or rather:

Don't do at run-time what can be done at compile-time.

This doesn't need to be any more manual that the labor involved in uploading content into a CMS or data store by another name (e.g., blog engine). It's more of a substitution of one task for another.

I prefer to think of the static HTML approach as the difference between using Makefiles versus a shell script for compiling code. Once you understand that the shell script will duplicate nuances that 'make' offers, use of Makefiles for compiling code usually gets viewed as the better approach. (There are always edge-cases, of course.)


But what I'm saying is, doesn't caching accomplish essentially the same thing? You process the resource once, cache the result and serve that. Full static site generation just seems like a more aggressive version of the same thing, and AFAIK you have to give up some of the benefits of being partially dynamic (like the ability to do have a "Latest comments" sidebar without unnecessarily relying on JavaScript).


I think there's a semantic hangup here, what you're describing is accurate, but it probably "better" explained (in the case of something like the popular wordpress caching plugins) as "compiling" your content to flat HTML which is saved on disk. The web server/CDN can then "cache" that file.


Yes, end results are equivalent, but complexity shifts.

If a consistent stack is more important, caching may be the best option. If fewer layers of complexity at run-time is more important, out-of-band/pre-processing may be the best option.


> It has always amazed me that so many people treat their static content as something that needs to involve a database.

It's not static. Maybe the text is, but the entire HTML document is plenty dynamic. You always have your list of recent blog posts, often some dynamically calculated dates ("posted yesterday 25 hours ago"), some quote-of-the-day or other banner, and maybe advertisements. Personal blogs might be able to get away with HTML, but at a corporate level (even company-blogs like MSDN's) that doesn't fly, where tons of hands are in the pot all with their own widgets and contributions that must automatically go on every page of a site.

What you really want is a CMS with a compilation step that outputs static HTML each time something changes. That gives you the flexibility of database-driven CMS with the runtime performance of static HTML.


I'd argue that you can get away with it in many cases.

* Disqus for comments. * You re-generate the static site on every deploy. "Recent" can be regenerated every new post. * It's easy to convert a date into "2 days ago" with javascript. * Quote of the day, or other banner can be javascript too. * Advertisements are almost always javascript.

Im not seeing the need for a database in anything you described.


I've had this same problem myself, you can indeed get around some of it by using javascript but I'd rather not use JS for something unless I really have to. This is because you often end up with a slower site to the end user as you end up serving extra JS files (sometimes including jquery) not to mention noscript users etc.

I think the problem is that stuff that starts being pretty static can often end up getting slowly more dynamic and at some point you have to re-engineer it which becomes a pain so it's easier to plan around this from the start.

I took over a project which used static content to serve everything which included a lot of fwrite() PHP calls. As this got more complicated we ended up cache invalidation type issues where there was a hierachy of content that needed to be re-written back to various files every time something was changed on the site. This meant that saving changes to the site became incredibly slow as we often erred on the safe side which meant we ended up re-writing some things multiple times, also the code that checked files to see which parts needed to regenerate became exponentially more complex.

In the end I just generated everything from the DB and used memcache in a few select places, performance for serving content was about the same as serving the static files, the code was much cleaner (which helped make performance better in the long run) and usability from the content administrators POV was much improved.

You should always aim to be serving your most common content directly from RAM anyway, so whether this is from the kernel's pagecache or memcache doesn't matter so much, you can probably solve this problem using clever proxying too.


> What you really want is a CMS with a compilation step that outputs static HTML each time something changes. That gives you the flexibility of database-driven CMS with the runtime performance of static HTML.

Did you just define "caching"?


> Worse, the proposed solution when this happens is to add caching. > > No, the thing to do would be to add a .html file to your webserver.

Isn't adding an html file the same as caching? Only with different tools? I see no difference between writing a blog article in an html editor -> saving it to a html file -> uploading it to the server, and writing a blog article in a blog software -> save it to the database -> publish it as html file.


Any time I set up a new website, I use WordPress with a theme from WooThemes. It's just the easiest thing to set up and make beautiful. A thousands times easier than using static html to make sites.

And ease of initial deployment is the only thing that matters to most of the population. The fact that this configuration may theoretically break one day? Who cares, as long as it takes me 3 hours to get it up and it's fixable.


Thanks to modern JS & HTML, static sites can also be a lot more dynamic than the boring old static sites of the past. You can integrate things like DISQUS without any dynamic content at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: