I’m always interested in reading articles like this, as I like to see the setups that people come up with to produce books and documents. I didn’t know about set virtualedit=all in vim!
If you learn how to extend Pandoc with your own filters, which you can write in several languages, there is no limit to what you can do. Here’s the description, published in the sadly defunct Linux Journal, of the system I created to help me write a book about gnuplot:
Everything in that link can be done with inDesign and having data. Finding a way to complete using console or alternative applications would take hacking the inDes app or finding some sort of IFTTT sort of automation when needed, then saving as a high res image, and referencing as a link in your console layout doc. At the end it would have to compile as an image into something (might as well be inDesign) and at that point why not just layout the book with inDesign from the start? Writing the book in a text doc with some tagged markdown for rules, linking text to connected and flowed into styled text boxes that have rules assigned to them, and generating all the charts and sheets necessary to complete. Visual communication isn’t a strongpoint in code interfaces.
I’m not sure I understand your comment, but I believe inDesign is a proprietary, closed-source product, probably driven mainly through a GUI. My goal was to write my book in vim. All I need to do is type, and the book comes out, including a visual index of all the plots in the book. Every link in the chain, and every tool I used, is open source (and free). The result is exactly what I want. To each his own, but the project, described in my article, is to create an interface for me as an author. That interface is typing in vim, using a set of tags I created for the purpose.
What does inDesign have to do with anything? This article isn't talking about using Windows or Mac software, and especially not proprietary Adobe software.
Also, if he's writing a book, why would he want to save anything as a "high res image" (implying raster graphics)?
I know this is tangential, but I would love for someone to talk about writing a more visual type of book, full of images, tables and charts for the business world.
A table like the one in the first screenshot of this post works well because the author is not repeatedly iterating on it, there's very little text and information flows top-to-bottom very neatly. That's great, but it's also extremely basic.
Take a look at something like https://www.jpmorgan.com/jpmpdf/1320605428574.pdf and imagine writing that. How do you lay things out on a page? How do you make content fit a layout? There's no grid.
The reality is people use PowerPoint to do that, but PowerPoint is a slide authoring tool that assumes you have a few bullets, maybe one or two images per slide.
Dense presentations make its shortcomings obvious and quite painful.
It boggles the mind that with all of the resources dumped into CSS/JS and web development in general, nobody has leveraged that experience to build an authoring tool that's 21st-century ready, with version control, with a clear separation but nonetheless linked relationship of raw data, actual content output and formatting and final publishing into PDF.
Using Powerpoint, for every slide the author chose (potentially) a different Powerpoint template (2×1 columns, 2×2 etc). They have complete freedom to "break" the structure, such as with callouts pointing to the "other" column, images going beyond the margins.
A automatic template removes this flexibility, but allows scripting or rebuilding the document with different text/data. That's the compromize.
Remark.js achieves some of the most basic parts of this, but would need some fiddling to add some CSS grid support and/or default templates: https://remarkjs.com/ (Except for being ugly, http://mobmad.github.io/js-tdd-erfaringer/ shows some possible structure with Remark.js).
I'm not so sure... I make them pretty much daily, and we print them and call them "books".
I'm not saying you shouldn't be able to tweak them manually, but there's got to be a more ergonomic language for drafting pages than literally dragging objects pixel by pixel, especially when most of the content comes in four forms: tables pasted in from Excel, charts pasted in from Excel, bullet lists and simple graphics around text like circles and squares
Learning LaTex and tiKz help out with this. It looks like a presentation. So latex beamer package with some custom templates. The downside is that latex and tikz has a little bit of a learning curve. But it is worth it in the long run.
I assume organisations that need such complex layout also have a budget to pull together immersive HTML pages, infographic design or comprehensive reports. For instance the WHO has some pretty complex and visually pleasing reports; they seem to be using InDesign and I'm sure they use actual designers and researchers to produce them.
I like the idea of using open-source tools to create books and documentation because you could incorporate the process into a workflow that pulls in actual code or on-the-fly generated graphics. I don't own any commercial software, so I don't know how feasable that process would be with something like InDesign.
> I assume organisations that need such complex layout also have a budget to pull together immersive HTML pages
You'd be surprised
> For instance the WHO has some pretty complex and visually pleasing reports; they seem to be using InDesign and I'm sure they use actual designers and researchers to produce them.
That workflow works for once-a-quarter, or maybe once-a-year books. When you need to crank out 3-4 books a week and go through 50 versions in 10 days, InDesign is just too slow, so we resort to PowerPoint
It's even possible to replace (Xe)LaTeX with weasy¹, a Python HTML-to-PDF converter. It supports two-colums via CSS, automatic CSS hypens, CSS page counters and embedding SVGs. I just needed an HTML header with CSS in the markdown file.
This is a very interesting (open source) project that I didn’t know about; thank you for mentioning it.
But it doesn’t replace LaTeX, as it doesn’t produce the same results. A glance at the sample documents reveals the ugly typography resulting from the word-processing layout strategy employed in web browsers. This is confirmed in the documentation. So this could be useful if you have an existing set of HTML pages that you need to convert to PDFs, but, if you’re starting a project where you want to produce both HTML and PDF, this should not be part of the solution.
It looks nice for graphics-heavy documents, but the quality of the typographical output doesn't come close to LaTeX with microtype. I do wish that LaTeX had something similar to CSS, however. The separation of markup and styling makes the web easier to use for complex layouts, which are not generally TeX's strong suit.
The first things that jump out are the large and uneven gaps between words and the “color” variations among paragraphs. What I mean by the “word-processing layout strategy” is the algorithm where, when you run out of space on a line, you simply break the line at the end of the previous word, fill up the space (for justified text) by expanding the spaces between words, and begin the next line. When you get to the end of the paragraph you go on to the next one. The TeX layout engine, in contrast, makes several passes over each paragraph, adjusting the line breaking (including hyphenation) in order to optimize its appearance (which includes such things as trying to avoid successive hyphenated lines); then, when the page is set, it goes over the entire page to try to equalize the density, or color, among paragraphs.
Ligatures don't make good typesetting. In fact, I'd expect a text to have enabled ligatures by default, that's not something to be proud of in any way.
Typesetting is about how the words are placed in the given space, how they're broken up, how the spacing is done to not line up between lines, managing punctuation, figures etc etc. Browsers do none of that, and one shouldn't expect them to, because that's not their job. A browsers job is to present content fast, not to figure out how to do it as beautiful as possible for minutes at a time, that's what TeX is for. TeX and a browsers rendering engine are different tools for different jobs and thinking one could achieve the same result as the other is not realistic in the current time.
The pipeline approach here is interesting, but it seems you'd be on your own for filtering out changes in the build directory, etc. I typically use watchexec [1] for this.
watchexec make
By default, watchexec will filter out changes in files based on `.gitignore`.
I personally use fswatch for this. The invocation is probably something like this:
fswatch -0 ***.md | xargs -0 make
I invoke a variant of this from my Makefile with a phony watch target. I think the main change is that I also echo a bell character, to provide some feedback.
Fun tip: Preview will automatically reload PDFs on disk, though with some limitations that I workaround by waiting for the bell.
Similar story here. I wrote and self-published a novel, both for e-readers and paperback, using only open-source tools, mainly around Pandoc. I wrote some more details here: https://gabrielgambetta.com/tgl_open_source.html
SVG is part of the standard, but not well supported by all epub reading systems. Some displays will fail, some will display as small non-scalable images. Apple's iBooks reader is one of the better ones in that regard.
pretty cool. I'm using a similar setup that allows a real-time preview of every change by means of the `entr`[1] command and gets triggered by saving the markdown.
I was building a new API recently, and was looking for a good documentation solution.
The commercial cloud based solutions (Gitlab, Confluence, et al) are pretty good, but you have to keep paying or your documentation disappears. Self hosted Wiki or documentation solutions were also out, due to the pain of migrating content in and out.
We ended up with a very simple solution of Markdown + CSS + Pandoc + make. Pandoc takes the CSS and MD files as input, and outputs HTML.
The MD files are in the API repository, deployment has been setup so that the latest documentation is deployed automatically with each API update.
I did have a look at swagger, but it felt way to bloated and complex for what we wanted. With Markdown we know that even in 10 years time when services like swagger are long gone, it'll be possible to view markdown files. Also, there is barely any learning curve with Markdown.
Nice and thanks for sharing your setup. The footer is very informative, but I use GitHub style markdown, need to check if there's some workaround. For epub customization, this article [0] might help. Good luck for your book.
Here's how I generate PDF with pandoc+xelatex [1] I use gvim as my editor and have mapped a key (which then executes a shell script) to generate the book.
I did this too, although the makefile presented in the article is much cleaner than mine was. Definitely recommend XeLaTeX. You're not going to get very far without unicode. I had to drop down to LaTeX often to control formatting, but Pandoc helpfully lets you do that.
>It allows to move the cursor past the last character. If you insert a new character there, it is automatically padded with spaces. It is easier to see it than to explain it:
>My first programming environment was Turbo Pascal, and this is exactly how the cursor works there, which I grew accustomed to.
Holy shit! What a rush of memories reading that unlocked :)
1) pandoc is awesome.
2) There are integrated development environments that allow you to write in markdown and output to pdf, html, and word with the flick of a switch. Rstudio with knitr, bookdown, and markdown has some nice functionality. Plus you can do graphs and drawings and embed them in the the rmd (r markdown) text.
3) There is an earlier post in HN from Gilles Castel on how to speedily write text through the ultisnips package. Very much a game changer on how I use vim to work with anything text related.
Can you refer to figures and have the name rendered? For instance, a piece of text referring to a figure and the figure’s label will both get rendered to “Figure 2.8” regardless of paragraph edits and figures inserted or delete before it?
Nice article. I'm all for it ! I use Pandoc and Makefile as well. Except I use Emacs and Inkscape for SVG graphics. IMHO this is the way to produce documents in the 21st century.
Pandoc and make totally. I have a personal workflow for all my academic articles; which is expanding to books, and which involves chapters stored in individual markdown files in github, pandoc to build with a makefile to tie it all together[1], and zotero spitting out CSL json bibliographies, etc.
Honestly, though, I find writing in Markdown in vim/emacs to be really unergonomic. The unit of writing in code is the line (or the function or the block or the s-expression or whatever depending on language and task---something comprehensible to vim and emacs anyway); but with prose in Markdown the unit of writing is much less defined---sometimes sentence, sometimes paragraph, sometimes clause... and the movements just don't work for me there.
So I just do the writing in Sublime Text. Seems to work for me.
[1] But I'm not nearly good enough with make to do complicated things like back up or build every file in a directory. I hacked together my own backup utility @ https://github.com/paultopia/writingBackup
Thanks for sharing this work flow and also for writing this book! I sometimes think I would like to write a book about microcontroller basics (which I've collected knowledge from countless blogs and white papers) but I know it's a huge project!
Also it is cool that you can run draw.io yourself! I've used yed for documentation but this looks nice and is capable.
I used LiterateMarkdown [1] to write a PhD thesis in Markdown with reasonable success. It's main feature is being able to read Jupyter notebooks and include computations and plots using R, python and Groovy inline, so pretty handy for a thesis with lots of data analysis.
A beauty of using Pandoc is you can translate to Word for people who insist on that for review / edit / comments.
One tip: on OSX I use Skim as the PDF previewer. It is not a particularly great or special PDF viewer except for one thing: it does live update of the PDF without shifting the position of the page, even if you are zoomed in etc. This means you can work on a section iteratively and watch it update live as you work, which is pretty handy for proof-reading what you are writing.
Here's a result [1] from a system I've put together, primarily using AsciiDoc(tor) and PO4A[3], to allow us to write a source document then translate it into multiple languages. It produces HTML and PDF, but ePUB is an option too.
Using AsciiDoc rather than Markdown has several benefits. The language supports many common book features, especially for technical books, like those "! Warning here" callouts, cited quotes, captioned figures/tables/codeblocks, internal links, I think even an index. It's also a lot more stable; I'm not concerned that there will be significant syntax changes in 5 years time. The user manual [2] is the quickest way to see what AsciiDoc can do.
PO4A is an adaptation of GNU GetText to use on prose. PO4A's output can input into a typical translation workflow -- distributing the files, or using online translation services. It mostly supports AsciiDoc, though there are some bugs, and outputting a PO file directly from AsciiDoctor (with a plugin) might be better -- PO4A parses AsciiDoc itself.
The code is at [4]. It's in slow development when necessary for new documents; I don't particularly intend to polish it for release or wider use.
KiCAD's documentation was the best example of something similar (AsciiDoc + PO4A) to what I've put together.
The missing pieces, which are closely related, are translatable and flexible diagrams. AsciiDoctor supports plenty of diagram tools, but none of them can do this. For example, the diagram at [6] is an SVG, which (since it's XML) can be translated using PO4A. However, in French the longer text spills out of the boxes. The previous diagram is an image, for this reason.
Is there an open-format (preferably open source) diagramming tool, which supports wrapping text, and even resizing "too long" text? I would be very interested!
Pandoc is great. make and vim are great too, but as you can see these tools will produce PDF files, HTML files, text files, markdown files and a lot jargon that the readers simply aren’t interested in. I mean normal readers here and not tech folks holed up inside a terminal with a homebrew theme.
If you learn how to extend Pandoc with your own filters, which you can write in several languages, there is no limit to what you can do. Here’s the description, published in the sadly defunct Linux Journal, of the system I created to help me write a book about gnuplot:
https://lee-phillips.org/panflute-gnuplot/