Lessons from Building a Static Site Generator (2020)

eatonphil · on June 30, 2021

This is a good review of a fairly complex piece of software. But don't let this convince you that all static site generators must be complex.

Out of laziness, most sites I run have their own 100-200 line Python static site generator that takes Markdown (if I'm really feeling it) or HTML files with Jinja templates and generates pages around them. The core generator code hardly ever changes year by year. Here's an example [0].

This isn't to say that everyone should always write their own. I am just a bit surprised by all the debate around each generator because they all produce the same thing and the only (or major) variables are the template language and what themes are built in (though of course you can always bring your own CSS).

But _using_ a static site generator is a very good idea. If you have no other inclinations, I think the stack that makes sense for anyone with multiple contributors is to use WordPress for editing and then have a plugin that will generate static pages from it so not every request to your site hits the database.

[0] https://github.com/eatonphil/notes.eatonphil.com/blob/master...

pjc50 · on June 30, 2021

I've become convinced that it's easier to write a SSG than understand someone else's, and it's definitely quicker than trying to evaluate the market and pick one.

schwartzworld · on July 1, 2021

This is absolutely true. My "SSG" is basically just pandoc paired with a script to update the index on the main page and add the new post to the RSS feed.

susam · on June 30, 2021

> But don't let this convince you that all static site generators must be complex.

Indeed! For example, https://github.com/sunainapai/makesite is a simple and lightweight static site generator written in Python. It can be customized easily by modifying the Python source code and adapting it to one's needs. I like that when I need a new feature, I can add it quite easily by writing a few Python functions. It is meant to be programmer-friendly.

Disclosure: My wife wrote this project. I am just a happy user of the project.

mturmon · on June 30, 2021

I used ```makesite.py``` as a template for a small site of 100-200 pages that I maintain. It has worked quite well.

The animating idea ("use this as a template but don't be afraid to customize or reinvent certain parts") liberated me from feature-by-feature evaluation of a bunch of complex config/templating systems.

For a small site like mine, getting the content right is the main thing and the site generator should mostly get out of the way. I update the site sporadically and don't want to re-learn a complex templating and config-file system every time I go back to it.

nickreese · on June 30, 2021

Author here. At it's core most static site generators are just fancy "string concatenation" tools.

In my experience playing with several generators before building Elder.js it isn't so much about the output that matters it is about how the static site generator lets you do non-trivial customization. Things that would be hard without a larger framework.

More importantly, when building a major project on a static site generator, it is important to have an upgrade path beyond a static site generator should you require it. Elder.js was built with that use case in mind though I didn't cover that in the article.

Being able to move to SSR should the project require it is a huge plus in my book.

eatonphil · on June 30, 2021

> At it's core most static site generators are just fancy "string concatenation" tools.

I think that's a bit reductive (especially since template libraries themselves are already complicated, not in a bad way). To me a static site generator is: 1) a string template library, plus 2) a file system walker/trigger to generate from a template, plus 3) additional data to feed the templates, plus 4) the actual content.

Then there's of course the additional features you may or may not need: a tagging system, a comment system, a subscription system, etc.

Thankfully despite all these components you don't need to write much of the actual code since they all exist as builtin (file system walking) or major OSS (Jinja, Mustache.js, etc) libraries. The SSG is primarily glue.

AndrewStephens · on June 30, 2021

> Out of laziness, most sites I run have their own 100-200 line Python static site generator that takes Markdown (if I'm really feeling it) or HTML files with Jinja templates and generates pages around them. The core generator code hardly ever changes year by year.

I maintain my site exactly the same way (minus the Jinja) and it is a workflow that works for me. Simplicity is best, even beating flexibility) when it comes to tools that help you express yourself. Otherwise you spend all your time wrestling with your tools rather than creating.

Bayart · on June 30, 2021

Yesterday someone ran a thread where people posted tons of headless CMS options[1]. There might be a few ideas there for people interested in your comment.

[1] https://news.ycombinator.com/item?id=27674105

jjjbokma · on June 30, 2021

Mine is just over 1KLOC :-) But it includes both a RSS and a JSON feed, support for Twitter card / Facebook sharing, a calendar view, and a tag cloud. Live demo: https://plurrrr.com/

Code is available at github: https://github.com/john-bokma/tumblelog

jazzyjackson · on June 30, 2021

The choice of using one markdown document to render all the pages is really interesting to me. Do you just edit the document in your terminal? Trying to imagine the usability of that, I guess if I was quick at jumping from one "page" to another it might be faster than opening and closing files.

jjjbokma · on July 1, 2021

I use Emacs to edit the page in Markdown mode with some additional font locking I wrote. It looks very similar to the actual blog.

Currently the input file is 1.4MB (828 blog entries) which Emacs can handle without any problems. Searching works very fast. I keep the file open and writing a new entry is just adding stuff to the top of the file.

_tom_ · on June 30, 2021

My last site’s generator is 71 lines of JavaScript. Just enough to walk the directory and run mustache over each file.

_ofdw · on June 30, 2021

I went down this rabbit hole and found everything to be overcomplicated for my use case. I'm so sick of static site generators that have seven layers of templating engines and complicated build systems.

Most of the static site builders I tried were either way too complex, or else just straight-up didn't work at all (looking at you, coleslaw).

I tried to go full emacs and use org export (org being my favorite text format) but the default export is horrendous and the documentation for org to html export is so bad it might as well not exist.

Software is simultaneously awesome and infuriating.

So after three days I just gave up on org-export and now I have pandoc shit out an html snippet that I concatenate with a hand-rolled html preamble and postamble via a Makefile.

Found a few rough edges, probably because org is underspecified. It's not elegant but it works for my use case.

jstrieb · on June 30, 2021

For what it's worth, if you're using Pandoc, you can set the HTML output to be "standalone" based on a simple template.[0] You can also include a standard header and footer to be automatically inserted for each generated page.

    pandoc \
      --standalone \
      --css=/style.css \
      --highlight-style=code-highlight.theme \
      --variable=lang:en \
      --include-before-body=navbar.html \
      --include-after-body=footer.html \
      --template=template.html \
      $MD -o $HTML

I use a variation of this command in a bash script to generate my entire static site.[1] A friend improved upon my script with a Go implementation that does some more advanced stuff, but still compiles Markdown to HTML using this command under the hood.[2]

0: https://pandoc.org/MANUAL.html#option--standalone

1: https://github.com/jstrieb/personal-site/blob/master/compile...

2: https://github.com/lsnow99/dudu

breck · on June 30, 2021

Try https://scroll.pub

It uses Scrolldown instead of markdown, which is simpler, cleaner, and incredibly extensible.

The command line app has just a few commands—-and they all take zero params.

A site is just a single folder, and because content is written in Scrolldown, it works great with git and is great for content sites or collaborative strongly typed databases.

It fast, I get about 300 pages per second, and not a lot of code (sub 1k excluding dependencies), and the code is tested.

It’s in nodejs now but no reason scroll can’t be language agnostic.

I’ve been around SSGs for over a decade and designing this one to be simple, reliable, and to stay out of the creators’ way. I think it could be the last SSG you’ll ever need.

wishinghand · on June 30, 2021

Is this hyperbole or am I just lucky to avoid whatever you tried? What SSGs require 7 layers of templating and complicated build systems. Whenever I tried out Sergey, Nuxt, Docpad, and a few others, there was just one templating engine each and a build command for the CLI.

nickreese · on June 30, 2021

Hey all -- Author here. This was a reflection on building Elder.js.[0]

Happy to answer any questions. I'll be going through and adding context to the questions I see.

[0]: https://elderguide.com/tech/elderjs/

tsegratis · on June 30, 2021

Really appreciate your frank assesment of the design and current status. For instance:

> the ElderGuide.com team expects to maintain this project at least until 2023-2024

So nice to see the absence of a hype train, and also so nice to get an insight into your view of the design space

I would never be your market for elder.js. But I appreciate learning from you, and appreciate building software alongside you

pier25 · on June 30, 2021

This is a great post, but why doesn't it have a date?

It's infuriating to see blog posts or even news that don't clearly display the publish date near the title.

mkr-hn · on June 30, 2021

Some marketing blog long ago said to remove dates so people can't tell when a post was written, and it's plagued blogging ever since. The idea is that "evergeen content" shouldn't need a date, but a date is context. No one is so good at writing that their writing has nothing anchoring it to the context of a point on a timeline.

kevincox · on June 30, 2021

I agree. You can always update the date, or add a "refresh" date if you do update the content (or just verify that it still applies).

nickreese · on June 30, 2021

Hey, author here. I banged this out quickly and didn't update the template to include the date as an oversight. It was written in Nov 2020 as shown on the homepage.

pier25 · on June 30, 2021

> It was written in Nov 2020 as shown on the homepage.

Nobody visits homepages anymore. :)

nickreese · on June 30, 2021

Not arguing that. Updated the article template to keep others from getting infuriated. ;)

pier25 · on June 30, 2021

Awesome!

yarinr · on June 30, 2021

The publish date is 2020-11-02, as appears on the articles list at https://nicholasreese.com/

I agree they should probably make it visible on the article page itself...

corobo · on June 30, 2021

Probably forgotten rather than whatever the other comments have heard haha. I didn't even realise my own blog was missing dates till just now

Infuriating is a bit much

pier25 · on June 30, 2021

> Infuriating is a bit much

Ok maybe I was exaggerating :) but it really annoys me.

tomjen3 · on June 30, 2021

If it is a great post, why does it matter when it was written?

How to win friends and influence people was written close to 100 years ago, but it is still recommended in plenty of places.

The dragon book is older than me, but it is still one of the recommended books to write a compiler in.

Why does a blog post have to have a date displayed on it? If it is about a specific version of some software, I can understand, and agree with you, why it should be mentioned in the post.

pier25 · on June 30, 2021

> How to win friends and influence people was written close to 100 years ago, but it is still recommended in plenty of places.

Because human nature hasn't changed. Front end stuff changes every day.

klodolph · on June 30, 2021

I wrote a static site generator for my own personal site. I've been using it for over 10 years, and it's gone through several major refactors / redesigns. A few comments:

1. Template system

There are tons of different template systems out there for things like the "shortcodes" in the article.

    {{youtube id="123asdf4" /}}

My conclusion is that the correct way to do this is with custom tags,

    <embed-youtube id="123asdf4"></embed-youtube>

I apologize for the verbosity... but this is completely valid HTML5, and you do not need anything but an ordinary HTML5 parser to parse this. This maximizes your choices for the libraries you use in the static site generator and it maximizes the level of support in whatever editor you choose to author the site in. For example, you can just use the HTML mode in Vim or Emacs, or you can use VS Code, TextMate, Sublime Text, etc. and get a ton of features: syntax highlighting, indenting, etc.

While on the surface it looks verbose because of the closing tag, in most editors, you only have to press a key or two to close the tag. HTML5, strictly speaking, does not support self-closing tag syntax for custom tags. That syntax is only supported for void elements. There are only 16 void elements in HTML5.

I use "prefix-suffix" syntax to avoid ambiguity... any tag with a hyphen is obviously a custom tag.

2. Routing

Something you can use to tackle the routing complexity is to place your source files in the same path as the canonical URL. You only need routes for generated content, like index pages and such.

3. Index data

You'll naturally want to generate indexes and create previews for links. I suggest that you start by looking at the schema.org schema for web pages and work with a useful subset of that. This way, you can generate indexes on your web page using the same exact data, same exact schema, that you use for the JSON-LD data you provide for search engines like Google.

This is a minor point, but it reduces duplicated effort between the code for generating content for your website and the code for generating JSON-LD metadata.

Don't dive too deep into the schema.org schema, just take a couple bits and pieces that you need, and refer to the feature guides in Google's documentation:

https://developers.google.com/search/docs/guides/intro-struc...

nickreese · on June 30, 2021

Author of the post. This is really an interesting take on shortcodes. I've been struggling with a format that the svelte compiler likes and can be used in markdown. This may be the answer.

> <embed-youtube id="123asdf4"></embed-youtube>

Thank you.

EricE · on June 30, 2021

"2. Routing Something you can use to tackle the routing complexity is to place your source files in the same path as the canonical URL. You only need routes for generated content, like index pages and such." Thank you! The only saving grace for a Gatsby site I recently did was it leveraged a template that did that automatically. To say that it dramatically simplified things is a gross understatement. Template for those interested: https://github.com/18F/federalist-uswds-gatsby

z3t4 · on July 1, 2021

If you are going static with a SSG I think you should embrace HTML5 semantic elements. Write your web site like it's 1999! Only there are much more elements to choose from now with HTML5! Ohh and we have CSS! Don't complicate things! The only real problem is how you get the marketing team to write HTML and use a version control system (git, mercurial, et.al). Or do you give up and give them a WYSIWYG content editor? The thing with semantic HTML is that it's much nicer then a pile of of divs with classes sprinkled over it.

klelatti · on June 30, 2021

A big thank you to the author for open sourcing this - I've been playing with this to implement a largish static site (11,000 pages) and (as a definite non expert) have found it relatively easy to understand and use - and it's lightning fast.

Just one comment: found implementing a Svelte Leaflet Map component a bit of a struggle - an addition to the docs on this would be very useful.

encryptluks2 · on June 30, 2021

I'm a huge fan of Hugo because it is a single binary and full of useful features, however I certainly think there is a lot of room for improvement. While Go templating may not be everyone's cup of tea, it is simple enough and the additional functions added by Hugo cover a lot of use cases.

Hugo has shown me that my ideal generator will likely be built with Go, although I could see a C, C++, or possibly Rust equivalent being fine for my needs. However, what I desire more than anything is to have a really performant default template using pagination and hierarchy based on the file system.

I don't want to have to define titles or empty _index.md files to define sections, just want to point it at a nestable directory on my file system and use filenames for pages and directories for sections. This significantly improves the usefulness for anyone, because at that point it can be used as a knowledgebase/note-taking system from the get-go without needing any custom templating to make it work out of the box.

The other important factor is building in search capabilities. While I think that client-side JavaScript and a JSON output works for this goal and can be optimized, Xapian would be a great alternative and extend the usefulness outside of the JavaScript browserbase. So far Hugo has left serving production sites up to third-party web servers, but I see no reason why they couldn't finish implementing the `serve` components to be production-ready and implement a search component and possibly other dynamic components.

I really hope to see someone with a strong understanding of efficiency come in and own this arena building off Hugo's single binary approach, but with an open mind to make it robust to handle all kinds of self-hosted content. Even implementing a bookmark manager using this approach would be awesome, having it auto-sort YAML/TOML/HCL or whatever and be able to serve up searchable bookmarks without even relying on JavaScript. The possibilities are endless with the types of apps that can be built with this design using readable files as data for the content.

whydoineedthis · on June 30, 2021

What is hydration? You talk a lot about it, but I've never had to add water to my websites before.

nickreese · on June 30, 2021

Good question. Hydration is where Javascript is rendered statically or on the server and the client needs to take over that HTML.

Traditional frameworks like Next.js, Gatsby, Nuxt.js all "fully hydrate" the client.

This means that every bit of HTML that is sent to the is browser is taken over by JS on the client.

This has it's costs but it is done to give interactivity.

Partial hydration is where you are only adding interactivity to the parts of the site that need it... think of it like the good old days of jquery but with a modern front end framework... Svelte.

whydoineedthis · on June 30, 2021

Thank you, very helpful explanation.

Santosh83 · on June 30, 2021

Here you go: https://en.wikipedia.org/wiki/Hydration_(web_development)

ildon · on July 1, 2021

I will still encounter many of the issues that they also mentions.

In my experience, writing a static site generator is very much a continuous trade between flexibility and ease of use.

I also agree with most comments that it is often easier to write your own static site generator rather than finding the one to use amongst all the available choices.

That is why I made my own, and obviously I ended up writing a whole bunch of documentation because you can't escape the need to clarify how things work.

If you're interested you can check it out at https://yassb-foss.github.io/

jrm4 · on June 30, 2021

Alright, I'm skimming this whole idea of newfangled "static site generators" that involve a lot of Javascript and I'm left with a whole lot of "Isn't this just ____ with extra steps?"

I'm seeing "shortcodes" and I'm like -- as in variables and/or configuration files?

Or, more broadly -- why Javascript for the backend? This looks like a silly level of complexity. I'd start thinking about it in Bash and then probably head over to e.g. Python once databases et al start getting involved. What am I missing here?

nickreese · on June 30, 2021

Shortcodes as they are implemented in Elder.js (what the article is about) are borrowed from WordPress. Basically they are a placeholder such as [[lastestTweet/]] that let you add dynamic content into otherwise static content.

While Elder.js does allow for full server side rendering making "Javascript the backend" the goal of the static site generator is generate static HTML/CSS/JS that can be hosted from a CDN, S3, or other static host.

That said, one of the biggest pitfalls with building a major site on a static site generator is there is often no upgrade path to server rendering. Elder.js does offer that out of the box.

brundolf · on June 30, 2021

> why Javascript for the backend?...[I'd] probably head over to e.g. Python once databases et al start getting involved

I could ask the same thing: why Python for the back-end?

Both languages have similar feature-sets and are roughly equally suitable to this task. Given that, a person should use whichever they're most comfortable with. Lots of people are comfortable with JavaScript these days, ergo there's lots of activity in the JavaScript SSG space.

jrm4 · on June 30, 2021

Fair -- but let me be more precise then. I do like the idea of "use what you learned and are most comfortable in."

That being said, I think what I mean by "e.g. Python" is specifically - "older slash more text oriented slash proven"

If you like Javascript, fine. But I think there's a case to be made that there is a completely unnecessary "bloat" to Javascript -- especially even as the author himself has suggested that at the end of the day it's all string concatenation.

If that's the case, (as it frequently is) it ends up boiling down to "what handles text well and in an established and smooth way," and Javascript does not score high there, I'd suggest.

brundolf · on June 30, 2021

It doesn't sound like you write much JavaScript :)

> I think there's a case to be made that there is a completely unnecessary "bloat" to Javascript

"Bloat in JS" is something of a trope, and I'm not really even sure what you mean by it here. Typically people mean "too many JS dependencies are often drawn in for web pages", but that's not really relevant. Maybe you mean syntax bloat (method calls instead of list comprehensions)? If so, you're not totally wrong, but it's also not really a big deal in my experience. If you're talking about runtime/performance, well... V8 tends to be faster than CPython in raw compute (excluding native modules) because Google has put so much work into optimizing it (not that that matters much for a SSG either)

> it ends up boiling down to "what handles text well and in an established and smooth way," and Javascript does not score high there

Whenever I need to process some nontrivial text, the first thing I do is open up a Chrome tab for the JS repl. I think the following it pretty smooth:

  const csv = `
  Name,Email,Phone Number,Address
  Bob Smith,bob@example.com,123-456-7890,123 Fake Street
  Mike Jones,mike@example.com,098-765-4321,321 Fake Avenue`

  const lines = csv.trim().split('\n').map(line => line.split(','))

  const [headings, ...data] = lines

  const objs = data.map(datum => Object.fromEntries(
    headings.map((heading, index) => [heading, datum[index]])))

  console.log(objs)


  // output:
  [
    {
      "Name": "Bob Smith",
      "Email": "bob@example.com",
      "Phone Number": "123-456-7890",
      "Address": "123 Fake Street"
    },
    {
      "Name": "Mike Jones",
      "Email": "mike@example.com",
      "Phone Number": "098-765-4321",
      "Address": "321 Fake Avenue"
    }
  ]

Not to mention template-strings, which I use extensively in my own website:

  const header = `
    <h1>Welcome to ${pageTitle}</h1>
  `

Of course it's all subjective and Python does have format strings and some slick dedicated syntaxes. But I don't think it's fair to say "JavaScript does not score high" when it comes to text-shuffling.

jrm4 · on July 1, 2021

When I say bloat, I do mostly mean "human-side," not "computer-side" of things. Again, if it works for you it works, but from my POV the idea of opening up a massive application, a web browser, that's mostly unrelated to the task at hand -- when leaner, older, well-proven approaches exist is super-odd to me. I have actually seen this before, and here's the observation:

Consider someone who starts off text-mangling in Javascript, and then later has a go at a more "classic" approach, e.g. command-line with Bash or Python or even Perl. Then consider the reverse; someone who starts off classic and learns/tries this Javascript approach.

I'd be willing to bet huge sums of money that a decent number of the Javascript people will end up adopting the classic approach, and that very few, possibly zero, of the classic people will adopt the Javascript approach.

z3t4 · on July 1, 2021

I remember when string concatenations where slow in JavaScript and you used array join. But string concatenation is very optimized now in all major JS engines.

frabert · on June 30, 2021

Curious you mention bash! I did exactly that some time ago when I thought about starting blogging... And then never started blogging!

https://blog.frabert.me/posts/2018/11/11/blash.html

valenterry · on June 30, 2021

> What am I missing here?

People usually start their projects with the language they feel most comfortable in, not the one that is best suited for the project.

EricE · on June 30, 2021

Having recently done a small Gatsby site I can identify with the comments about complexity over time. And graphql does seem like utter overkill too!