Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Writing a Book with Pandoc, Make, and Vim (keleshev.com)
423 points by halst on April 13, 2020 | hide | past | favorite | 60 comments


I’m always interested in reading articles like this, as I like to see the setups that people come up with to produce books and documents. I didn’t know about set virtualedit=all in vim!

If you learn how to extend Pandoc with your own filters, which you can write in several languages, there is no limit to what you can do. Here’s the description, published in the sadly defunct Linux Journal, of the system I created to help me write a book about gnuplot:

https://lee-phillips.org/panflute-gnuplot/


Everything in that link can be done with inDesign and having data. Finding a way to complete using console or alternative applications would take hacking the inDes app or finding some sort of IFTTT sort of automation when needed, then saving as a high res image, and referencing as a link in your console layout doc. At the end it would have to compile as an image into something (might as well be inDesign) and at that point why not just layout the book with inDesign from the start? Writing the book in a text doc with some tagged markdown for rules, linking text to connected and flowed into styled text boxes that have rules assigned to them, and generating all the charts and sheets necessary to complete. Visual communication isn’t a strongpoint in code interfaces.


I’m not sure I understand your comment, but I believe inDesign is a proprietary, closed-source product, probably driven mainly through a GUI. My goal was to write my book in vim. All I need to do is type, and the book comes out, including a visual index of all the plots in the book. Every link in the chain, and every tool I used, is open source (and free). The result is exactly what I want. To each his own, but the project, described in my article, is to create an interface for me as an author. That interface is typing in vim, using a set of tags I created for the purpose.


What does inDesign have to do with anything? This article isn't talking about using Windows or Mac software, and especially not proprietary Adobe software.

Also, if he's writing a book, why would he want to save anything as a "high res image" (implying raster graphics)?


I know this is tangential, but I would love for someone to talk about writing a more visual type of book, full of images, tables and charts for the business world.

A table like the one in the first screenshot of this post works well because the author is not repeatedly iterating on it, there's very little text and information flows top-to-bottom very neatly. That's great, but it's also extremely basic.

Take a look at something like https://www.jpmorgan.com/jpmpdf/1320605428574.pdf and imagine writing that. How do you lay things out on a page? How do you make content fit a layout? There's no grid.

The reality is people use PowerPoint to do that, but PowerPoint is a slide authoring tool that assumes you have a few bullets, maybe one or two images per slide.

Dense presentations make its shortcomings obvious and quite painful.

It boggles the mind that with all of the resources dumped into CSS/JS and web development in general, nobody has leveraged that experience to build an authoring tool that's 21st-century ready, with version control, with a clear separation but nonetheless linked relationship of raw data, actual content output and formatting and final publishing into PDF.

What am I missing?

EDIT: one more example for good measure https://www.jefferies.com/CMSFiles/Jefferies.com/files/W%201...


Those aren't books, they are presentation slides.

Using Powerpoint, for every slide the author chose (potentially) a different Powerpoint template (2×1 columns, 2×2 etc). They have complete freedom to "break" the structure, such as with callouts pointing to the "other" column, images going beyond the margins.

A automatic template removes this flexibility, but allows scripting or rebuilding the document with different text/data. That's the compromize.

Remark.js achieves some of the most basic parts of this, but would need some fiddling to add some CSS grid support and/or default templates: https://remarkjs.com/ (Except for being ugly, http://mobmad.github.io/js-tdd-erfaringer/ shows some possible structure with Remark.js).


Going by the strict definition of a book [1] a file, a webpage or a website isn’t a book either.

[1] https://en.m.wikipedia.org/wiki/Book


I'm not so sure... I make them pretty much daily, and we print them and call them "books".

I'm not saying you shouldn't be able to tweak them manually, but there's got to be a more ergonomic language for drafting pages than literally dragging objects pixel by pixel, especially when most of the content comes in four forms: tables pasted in from Excel, charts pasted in from Excel, bullet lists and simple graphics around text like circles and squares



Learning LaTex and tiKz help out with this. It looks like a presentation. So latex beamer package with some custom templates. The downside is that latex and tikz has a little bit of a learning curve. But it is worth it in the long run.


Have you used Katex in place of Latex? The latter is much lighter, especially if you're trying to publish a tome for the web.


No, I haven't. But I'll check it out! I had to go the full route a while back. Yes, it was bulky, but got the job done.


That's part of it, but the content still flows top-to-bottom for the most part and is not quite enterprise-ready, packaged into a standalone app.


I assume organisations that need such complex layout also have a budget to pull together immersive HTML pages, infographic design or comprehensive reports. For instance the WHO has some pretty complex and visually pleasing reports; they seem to be using InDesign and I'm sure they use actual designers and researchers to produce them.

I like the idea of using open-source tools to create books and documentation because you could incorporate the process into a workflow that pulls in actual code or on-the-fly generated graphics. I don't own any commercial software, so I don't know how feasable that process would be with something like InDesign.


> I assume organisations that need such complex layout also have a budget to pull together immersive HTML pages

You'd be surprised

> For instance the WHO has some pretty complex and visually pleasing reports; they seem to be using InDesign and I'm sure they use actual designers and researchers to produce them.

That workflow works for once-a-quarter, or maybe once-a-year books. When you need to crank out 3-4 books a week and go through 50 versions in 10 days, InDesign is just too slow, so we resort to PowerPoint


It's even possible to replace (Xe)LaTeX with weasy¹, a Python HTML-to-PDF converter. It supports two-colums via CSS, automatic CSS hypens, CSS page counters and embedding SVGs. I just needed an HTML header with CSS in the markdown file.

    $ pandoc --filter pandoc-citeproc --csl ieee.csl --bibliography=paper.bib --smart --normalize -f markdown+multiline_tables+inline_notes -t html5 -V margin-top:0.5in -V margin-bottom:0.5in -V margin-left:0.5in -V margin-right:0.5in -o output.html input.md
    $ python3 -c "from weasyprint import HTML; HTML('output.html').write_pdf('output.pdf', presentational_hints=True)"
For LaTeX-style math equations I added mathjax-pandoc-filter² as filter to the pandoc args:

    --filter ~/node_modules/.bin/mathjax-pandoc-filter -Mmathjax.centerDisplayMath -Mmathjax.noInlineSVG
¹ https://weasyprint.org/ ² https://github.com/lierdakil/mathjax-pandoc-filter


This is a very interesting (open source) project that I didn’t know about; thank you for mentioning it.

But it doesn’t replace LaTeX, as it doesn’t produce the same results. A glance at the sample documents reveals the ugly typography resulting from the word-processing layout strategy employed in web browsers. This is confirmed in the documentation. So this could be useful if you have an existing set of HTML pages that you need to convert to PDFs, but, if you’re starting a project where you want to produce both HTML and PDF, this should not be part of the solution.


It looks nice for graphics-heavy documents, but the quality of the typographical output doesn't come close to LaTeX with microtype. I do wish that LaTeX had something similar to CSS, however. The separation of markup and styling makes the web easier to use for complex layouts, which are not generally TeX's strong suit.


I cant tell the difference between this layout quality and latex. What are you noticing?


The first things that jump out are the large and uneven gaps between words and the “color” variations among paragraphs. What I mean by the “word-processing layout strategy” is the algorithm where, when you run out of space on a line, you simply break the line at the end of the previous word, fill up the space (for justified text) by expanding the spaces between words, and begin the next line. When you get to the end of the paragraph you go on to the next one. The TeX layout engine, in contrast, makes several passes over each paragraph, adjusting the line breaking (including hyphenation) in order to optimize its appearance (which includes such things as trying to avoid successive hyphenated lines); then, when the page is set, it goes over the entire page to try to equalize the density, or color, among paragraphs.


Maybe you already knew about it, but the microtype package improves the aspect of your documents even more: https://ctan.org/pkg/microtype


Skimming the examples the typographical quality is that of a webpage (which is to be expected), miles below TeX-quality typesetting.


I think it really depends on the font you use and the CSS rules you apply. LaTeX needs tweaking, too, even with a good template.

E.g., if your font supports it, you can enable ligatures:

    text-rendering: optimizeLegibility;


Ligatures don't make good typesetting. In fact, I'd expect a text to have enabled ligatures by default, that's not something to be proud of in any way.

Typesetting is about how the words are placed in the given space, how they're broken up, how the spacing is done to not line up between lines, managing punctuation, figures etc etc. Browsers do none of that, and one shouldn't expect them to, because that's not their job. A browsers job is to present content fast, not to figure out how to do it as beautiful as possible for minutes at a time, that's what TeX is for. TeX and a browsers rendering engine are different tools for different jobs and thinking one could achieve the same result as the other is not realistic in the current time.


Pandoc can even free you of the second step by using WeasyPrint as PDF engine:

    pandoc --pdf-engine=weasyprint -t html …


To emulate the live preview, there is a neat piece of software called entr[1], from their main page, you can do something like:

    ls | entr make
And whenever you save a change, the build is triggered and the preview is updated.

[1]: http://eradman.com/entrproject/


The pipeline approach here is interesting, but it seems you'd be on your own for filtering out changes in the build directory, etc. I typically use watchexec [1] for this.

    watchexec make
By default, watchexec will filter out changes in files based on `.gitignore`.

[1]: https://github.com/watchexec/watchexec


This is pretty good, didn't know about watchexec, but you can achieve the same by choosing carefully the command you pipe from, for example:

    ls *.md | make
Will only trigger the build if a md file is modified, which is what I think the author is interested in.


I personally use fswatch for this. The invocation is probably something like this:

    fswatch -0 ***.md | xargs -0 make
I invoke a variant of this from my Makefile with a phony watch target. I think the main change is that I also echo a bell character, to provide some feedback.

Fun tip: Preview will automatically reload PDFs on disk, though with some limitations that I workaround by waiting for the bell.


For any interested, here is my Pandoc book writing setup.

I have a couple bash scripts that I use to call pandoc to generate PDFs, HTML, or ePub.

Here is the repo https://gitlab.com/pianomanfrazier/pandoc-markdown-book and here is my blog post https://pianomanfrazier.com/post/write-a-book-with-markdown/


After reading a couple posts here on HN about building a "second brain", I found a surprisingly effective setup to do that:

- Vim with vimwiki (https://github.com/vimwiki/vimwiki)

- A private Gitlab repo

- A simple cron job to commit all changes in `~/.vimwiki` to my private repo

And this is it! It would be possible to publish the wiki on the web using Gitlab pages, but so far it is working nice to me.


Similar story here. I wrote and self-published a novel, both for e-readers and paperback, using only open-source tools, mainly around Pandoc. I wrote some more details here: https://gabrielgambetta.com/tgl_open_source.html


"SVG is well supported with EPUB"

SVG is part of the standard, but not well supported by all epub reading systems. Some displays will fail, some will display as small non-scalable images. Apple's iBooks reader is one of the better ones in that regard.


pretty cool. I'm using a similar setup that allows a real-time preview of every change by means of the `entr`[1] command and gets triggered by saving the markdown.

  ls ./presentation.md |entr -c bash -c "pandoc --pdf-engine=xelatex --toc -N presentation.md -t beamer -o presentation.pdf; killall -HUP mupdf"
this would reload the pipeline and update the content of the pdf output. easy as 1-2-3 (no Makefile though which would be another step).

[1] https://www.systutorials.com/docs/linux/man/1-entr/


I was building a new API recently, and was looking for a good documentation solution.

The commercial cloud based solutions (Gitlab, Confluence, et al) are pretty good, but you have to keep paying or your documentation disappears. Self hosted Wiki or documentation solutions were also out, due to the pain of migrating content in and out.

We ended up with a very simple solution of Markdown + CSS + Pandoc + make. Pandoc takes the CSS and MD files as input, and outputs HTML. The MD files are in the API repository, deployment has been setup so that the latest documentation is deployed automatically with each API update.


Excuse me if this is a dumb question but did you consider swagger?


There are no dumb questions.

I did have a look at swagger, but it felt way to bloated and complex for what we wanted. With Markdown we know that even in 10 years time when services like swagger are long gone, it'll be possible to view markdown files. Also, there is barely any learning curve with Markdown.


Nice and thanks for sharing your setup. The footer is very informative, but I use GitHub style markdown, need to check if there's some workaround. For epub customization, this article [0] might help. Good luck for your book.

Here's how I generate PDF with pandoc+xelatex [1] I use gvim as my editor and have mapped a key (which then executes a shell script) to generate the book.

[0] https://cmichel.io/how-to-create-beautiful-epub-programming-...

[1] https://learnbyexample.github.io/tutorial/ebook-generation/c...


I did this too, although the makefile presented in the article is much cleaner than mine was. Definitely recommend XeLaTeX. You're not going to get very far without unicode. I had to drop down to LaTeX often to control formatting, but Pandoc helpfully lets you do that.


>It allows to move the cursor past the last character. If you insert a new character there, it is automatically padded with spaces. It is easier to see it than to explain it:

>My first programming environment was Turbo Pascal, and this is exactly how the cursor works there, which I grew accustomed to.

Holy shit! What a rush of memories reading that unlocked :)


Vim + Pandoc + Beamer + pdfpc is also the best way to write Presentations I have found so far:

https://github.com/maxmunzel/talk-algorithms-for-np-hard-pro...


Nice to learn how others approach this task. Curious, with so many extensions to markdown, why not use something like ascii doc or rst instead?


Same here. I don't understand why AsciiDoc doesn't get more attention.


A couple of things that might be of interest:

1) pandoc is awesome. 2) There are integrated development environments that allow you to write in markdown and output to pdf, html, and word with the flick of a switch. Rstudio with knitr, bookdown, and markdown has some nice functionality. Plus you can do graphs and drawings and embed them in the the rmd (r markdown) text. 3) There is an earlier post in HN from Gilles Castel on how to speedily write text through the ultisnips package. Very much a game changer on how I use vim to work with anything text related.

https://castel.dev/post/lecture-notes-1/

Nice post!



Can you refer to figures and have the name rendered? For instance, a piece of text referring to a figure and the figure’s label will both get rendered to “Figure 2.8” regardless of paragraph edits and figures inserted or delete before it?


For anything beyond the most basic document, use Acsiidoc rather than Markdown.

I prefer to use the Asciidoctor toolchain, but it's compatible with Acsiidoc.



So far, I got away with "In the following figure…"


Nice article. I'm all for it ! I use Pandoc and Makefile as well. Except I use Emacs and Inkscape for SVG graphics. IMHO this is the way to produce documents in the 21st century.


Pandoc and make totally. I have a personal workflow for all my academic articles; which is expanding to books, and which involves chapters stored in individual markdown files in github, pandoc to build with a makefile to tie it all together[1], and zotero spitting out CSL json bibliographies, etc.

Honestly, though, I find writing in Markdown in vim/emacs to be really unergonomic. The unit of writing in code is the line (or the function or the block or the s-expression or whatever depending on language and task---something comprehensible to vim and emacs anyway); but with prose in Markdown the unit of writing is much less defined---sometimes sentence, sometimes paragraph, sometimes clause... and the movements just don't work for me there.

So I just do the writing in Sublime Text. Seems to work for me.

[1] But I'm not nearly good enough with make to do complicated things like back up or build every file in a directory. I hacked together my own backup utility @ https://github.com/paultopia/writingBackup


Thanks for sharing this work flow and also for writing this book! I sometimes think I would like to write a book about microcontroller basics (which I've collected knowledge from countless blogs and white papers) but I know it's a huge project!

Also it is cool that you can run draw.io yourself! I've used yed for documentation but this looks nice and is capable.


I used LiterateMarkdown [1] to write a PhD thesis in Markdown with reasonable success. It's main feature is being able to read Jupyter notebooks and include computations and plots using R, python and Groovy inline, so pretty handy for a thesis with lots of data analysis.

A beauty of using Pandoc is you can translate to Word for people who insist on that for review / edit / comments.

One tip: on OSX I use Skim as the PDF previewer. It is not a particularly great or special PDF viewer except for one thing: it does live update of the PDF without shifting the position of the page, even if you are zoomed in etc. This means you can work on a section iteratively and watch it update live as you work, which is pretty handy for proof-reading what you are writing.

[1] https://bit.ly/2XzTpSy [2] https://skim-app.sourceforge.io/


I use something very similar for mathematical homework and notes: MacVim, with a Makefile that runs Pandoc with the Eisvogel template.

I also have a script that runs fswatch to run make on save.

Didn't know about virtualedit, though: tables are going to be so much easier now.


Somewhat related. I highly suggest the Goyo plugin for Vim if you want distraction free writing.

https://github.com/junegunn/goyo.vim


This reminds me of my own project to use pandoc for generating blog posts https://outfloor.org/


Thanks for sharing. Your book sounds very interesting to me. I like that it targets generating real ARM assembly.

I've signed up for updates, looking forward to the release!


Well done. Thanks for sharing your process! Looking forward to the completed book.


Here's a result [1] from a system I've put together, primarily using AsciiDoc(tor) and PO4A[3], to allow us to write a source document then translate it into multiple languages. It produces HTML and PDF, but ePUB is an option too.

Using AsciiDoc rather than Markdown has several benefits. The language supports many common book features, especially for technical books, like those "! Warning here" callouts, cited quotes, captioned figures/tables/codeblocks, internal links, I think even an index. It's also a lot more stable; I'm not concerned that there will be significant syntax changes in 5 years time. The user manual [2] is the quickest way to see what AsciiDoc can do.

PO4A is an adaptation of GNU GetText to use on prose. PO4A's output can input into a typical translation workflow -- distributing the files, or using online translation services. It mostly supports AsciiDoc, though there are some bugs, and outputting a PO file directly from AsciiDoctor (with a plugin) might be better -- PO4A parses AsciiDoc itself.

The code is at [4]. It's in slow development when necessary for new documents; I don't particularly intend to polish it for release or wider use.

KiCAD's documentation was the best example of something similar (AsciiDoc + PO4A) to what I've put together.

The missing pieces, which are closely related, are translatable and flexible diagrams. AsciiDoctor supports plenty of diagram tools, but none of them can do this. For example, the diagram at [6] is an SVG, which (since it's XML) can be translated using PO4A. However, in French the longer text spills out of the boxes. The previous diagram is an image, for this reason.

Is there an open-format (preferably open source) diagramming tool, which supports wrapping text, and even resizing "too long" text? I would be very interested!

[1] https://docs.gbif.org/collections-idea-paper/ or (in progress) https://docs.gbif.org/effective-nodes-guidance/1.0/

[2] https://asciidoctor.org/docs/user-manual/

[3] https://po4a.org/

[4] https://github.com/gbif/gbif-asciidoctor-toolkit/

[5] https://gitlab.com/kicad/services/kicad-doc

[6] https://docs.gbif.org/effective-nodes-guidance/1.0/en/#box-e...


Books are not files though!

Pandoc is great. make and vim are great too, but as you can see these tools will produce PDF files, HTML files, text files, markdown files and a lot jargon that the readers simply aren’t interested in. I mean normal readers here and not tech folks holed up inside a terminal with a homebrew theme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: