Happy 10th birthday pandoc

jesserosenthal · on Aug 12, 2016

A couple of cool things about pandoc:

1. It's a tool built in large part by folks involved in the humanities. John, of course, is a professor of philosophy. (We could stop there, since his contributions so outnumber all the others, but I'll go on...) I'm an assistant prof. of English. It has large contributions from the curator of medieval manuscripts at the British Library, and contributions from a historian as well. And those are just the ones that come to mind -- I'm sure there are others. It's cool to have a tool used by many folks in the humanities, but also in large part driven by their needs.

2. John is a great project lead, and it's an amazingly open project. If you're interested in getting started with haskell, the filters API allows you to really adapt it to your needs. And if you're interested in adding another reader or writer, there's a good chance it can be added.

For myself, I started out using the python docx library, found it didn't handle enough stuff, wrote one myself (covering mainly my conversion needs, not creation) and had it output JSON that pandoc could read. When that worked, I ported it to haskell, and posted it to the mailing list. Within a few weeks, the docx reader was part of the program. It's been a great experience -- both in using haskell, and in playing a significant role in a program I use and love.

confounded · on Aug 12, 2016

Pandoc has made the recent push for reproducible research across the sciences possible (via Rmarkdown/knitr and IPython/Jupyter) --- it's an essential part of solving a pretty big epistemological problem!

(For me, it's also a large part of solving my own productivity problems.)

It's also driven markdown itself to become more feature rich and internally consistent. I'm extremely appreciative of JGM's efforts to product a unified markdown spec[^1].

Huge thanks, John!

[^1]: http://commonmark.org/

ggambetta · on Aug 12, 2016

Happy birthday, Pandoc!

Pandoc is a key component in my Markdown-to-Kindle-and-Paperback pipeline. In fact I've soft-published an article about it: https://medium.com/@gabrielgambetta/how-i-wrote-and-publishe...

Amazing piece of software. Not that often a tool dominates an use case so thoroughly that there are virtually no alternatives, because there really is no need. Thank you, John MacFarlane :)

thangalin · on Aug 12, 2016

I had a problem converting embedded images and fonts from Markdown to EPUB; this script helps by piggy-backing on the epub-embed-font option to include images:

    #!/bin/bash

    FILENAME=$(cat title.txt).epub
    CSS=style.css
    IMAGES_DIR=images

    # Ensure images get embedded into the document.
    for url in $(grep image $CSS | grep url | sed 's/.*url( *\(.*\) *).*/\1/g'); do
      EMBED="$EMBED --epub-embed-font=$IMAGES_DIR/$url"
    done

    # Generate the ePub, including fonts and images.
    pandoc \
      --smart \
      --epub-cover-image=cover/cover.png \
      --epub-metadata=metadata.xml \
      --epub-stylesheet=$CSS \
      --epub-embed-font=fonts/MyUnderwood.ttf \
      $EMBED \
      -t epub -o $FILENAME output/*.vars

The .vars files are chapters that have their variable names substituted for values, such as:

    for i in chapter/*.md; do
      out=$OUTDIR/$(basename $i);
      cat variables.yaml $i > $out;
      pandoc $out --template $out > $out.vars

      cat $out.vars | sed -e 's/;;/\$/g' |\
        pandoc --chapters -t context -o $OUTDIR/$(basename $i .md).tex;
    done

A handy trick to avoid duplication (of names, places, or any text really) within the content.

Uehreka · on Aug 12, 2016

Thanks to Pandoc, I never need to use Microsoft Word at work. If someone wants me to upload some documentation to Sharepoint, I just write it in Markdown and convert to docx with Pandoc. Wonderful piece of software.

bbcbasic · on Aug 12, 2016

How do you update someone else's document?

dergachev · on Aug 12, 2016

What an amazing project! I use it all the time, for converting between textile (redmine) and markdown (everything else), and also for extracting my company's proposals from Google Drive, and converting them to latex and PDF with our customized template. The code for this is all here: https://github.com/dergachev/gdocs-export

Thanks so much to it's creators and contributors!

nandhp · on Aug 12, 2016

Redmine actually has a Markdown mode, though it's probably too much work to convert an existing Redmine installation (unless you can use pandoc?)

http://www.redmine.org/projects/redmine/wiki/RedmineTextForm...

speleo_engr · on Aug 12, 2016

I love the markdown to PDF converter using embedded LaTeX for math expressions. So much easier than writing pure LaTeX and you still get a beautiful result.

Johnny_Brahms · on Aug 12, 2016

I have started using pandoc again recently since thier org support has gotten a lot better in recent months. Using emacs, it is not a very hard decision to use org mode over markdown or rst.

With org support in pandoc, i no longer have to start up emacs to generate HTML for my website, so doing that in batches just got a lot easier.

stinkytaco · on Aug 12, 2016

I found it's conversion into org (from OPML) isn't fantastic, but definitely workable. But out of org is seamless for me as well.

therealmarv · on Aug 12, 2016

Great project. Have written my whole thesis in Markdown thanks to Pandoc (+Sublime+Latex+Zotero) and this great template https://github.com/tompollard/phd_thesis_markdown

drostie · on Aug 12, 2016

Happy birthday!

Pandoc is one of those things that I only recently discovered and totally fell in love with: I wrote a bunch of code in literate Haskell for work, so that I could have the logic right, and then ported it to JS to reduce the bus factor of the project. When I discovered pandoc I just ran it on my literate Haskell and the resulting PDF was so beautiful I had to print it out and tack it up on a cubicle wall, like a "this is what they pay me for" type of reminder.

hypertexthero · on Aug 12, 2016

I love Pandoc! Thank you, [John MacFarlane](http://johnmacfarlane.net/)!

kingkilr · on Aug 12, 2016

Happy Birthday Pandoc!

The ability to write and collaborate on memos in markdown, and turn them into fancy PDFs and .docs is seriously world changing.

criddell · on Aug 12, 2016

I'm working on some mechanical engineering software that produces a report that is heavy in tables and equations. Today we are generating html for display and have an option to output the same report in pdf form. We use htmldoc to do the html->pdf conversion. htmldoc is unbelievably fast, it's also quite limited. It doesn't know about Unicode or CSS.

We've tried other more capable converters but they are an order of magnitude slower than htmldoc. I guess Unicode and CSS support is expensive. htmldoc is also better than anything else we've tried at not putting page breaks in the middle of a table.

Would you say this type of report generation is something than Pandoc would be suited for?

FraaJad · on Aug 12, 2016

I'm a big fan of Pandoc. However, if you have a lot of tables in your document, ASCIIDOCTOR is a better choice to produce PDFs.

ASCIIDOCTOR has clear and superior table formatting syntax and more importantly it can work with CSV files. This way you don't have to copy paste data into your reports.

http://asciidoctor.org/docs/user-manual/#tables

type0 · on Aug 12, 2016

I wish pandoc was better on converting asciidoctor and not just asciidoc. Asciidoctor and asciidoctor.js is so much nicer than all those markdowns it's not even funny, wish I would have found it earlier so I wouldn't need to work with older markdown files or having to convert them.

zzleeper · on Aug 12, 2016

I agree that copy-pasting CSV files is undesirable, but there are several Pandoc filters, both in Haskell and Python, that allow you to embed CSVs into tables without much hassle.

criddell · on Aug 12, 2016

I've never heard of ASCIIDOCTOR before. Thanks for the pointer.

tome · on Aug 12, 2016

Sounds like the next format that pandoc should support :)

FraaJad · on Aug 12, 2016

IIRC, pandoc has asciidoc writer. ie., it can parse other formats and output asciidoc/tor format.

criddell · on Aug 12, 2016

So I tried running my file through Pandoc and it failed, so that's kind of a bummer.

Apparently U+3C1 is not set up for use with LaTeX and that seems to be a prerequisite for pdf. It helpfully suggests I try with --latex-engine=xelatex and that fails after a while because xetex.def isn't in the repositories anymore.

bsznjyewgd · on Aug 12, 2016

I'm not familiar with your exact distro/installation situation, but at least on Debian/Ubuntu, TeX and friends are split into multiple packages, where many binaries (including latex/lualatex/xelatex) are in one package (texlive-binaries), but the supporting files/packages for non-basic uses are in another package (texlive-luatex and texlive-xetex).

criddell · on Aug 12, 2016

As I understand it (and I just started using Pandoc today), pdf output requires TeX (or maybe it's LaTeX) from MiKTeX which I installed. By default, MiKTeX downloads dependencies as they are needed. For me, it dies when it tries to get xelatex.def because it's not on the server.

MiKTeX comes with an admin tool that has an option to synchronize repositories which supposedly fixes lots of problems, but apparently not this one.

For anybody that reads this and decides to get Pandoc and MiKTeX, I suggest getting the version of MiKTeX that bundles all of the dependencies. In the days of terabyte disks, I'm not sure it makes sense to pull down dependencies a few kilobytes at a time, especially if packages periodically go missing from the servers.

This is a side-by-side comparison of the HTML version and the generated PDF version:

http://imgur.com/a/Zkamk

j0e1 · on Aug 12, 2016

Enough of procrastination. It's time to finally start learning Haskell!

dredmorbius · on Aug 13, 2016

It feels redundant to say this with all the others expressing the same thoughts, but thank you thank you thank you John et al for Pandoc. It's been life-changing since running into it a few years back.

I do a lot of work researching older and historical documents. Some of these are available in only scans, meaning image-heavy PDFs, or rotten OCR transcriptions. Occasionally I get lucky and find a solid ASCII source.

A recent case in point, Dugald Stewart's 1793 "Account of the Life and Writings of Adam Smith", which I'd found in ASCII format.

A couple hours (mostly of finessing the markup) to pandoc, and I had a 72 page PDF, ePub, and HTML versions, the former two posted online.

https://ello.co/dredmorbius/post/-h-b6ek6segi8eamq2g6vq

ASCII Source: http://socserv2.socsci.mcmaster.ca/econ/ugcm/3ll3/smith/duga...

Markdown: http://pastebin.com/LdKXpHdR

PDF: https://drive.google.com/file/d/0B6Q6JFf-mAJPY0tfaXlQWUNwLWc...

ePub: https://drive.google.com/file/d/0B6Q6JFf-mAJPdXgwcm1NbDhQdEE...

(The Google Drive docs should be world-readable.)

That's only one of many docs I've fixed up using Pandoc.

0xmohit · on Aug 12, 2016

A huge thanks to John and all the contributors.

What's most amazing is it's ability to generate beautiful outputs (for printing, viewing, presentations alike). Awesome.

ausjke · on Aug 12, 2016

Using it all the time and love it.

The only issue I had though is that it does not convert certain charset sometimes due to some latex related issues.

mtrn · on Aug 12, 2016

Pandoc's Markdown to Word conversion saved me when I was writing a book. Thanks and happy birthday!

zwetan · on Aug 12, 2016

I love pandoc, thanks for this great tool :)

to me it's the perfect tool to generate user docs and manuals for software and other cli tools, my best use case is markdown (github flavoured) to man, markdown to HTML and PDF, etc.

jgalt212 · on Aug 14, 2016

Is pandoc the most famous/most used program written in Haskell?

codygman · on Aug 14, 2016

I think it's the most used by non haskeller's. I'd say most used and most famous is the ghc Haskell compiler.

erelde · on Aug 12, 2016

Happy birthday, I used it for both outputting pdf, and often when encountering a very long html pages without styling (quite common around these parts) to output epub.

ihm · on Aug 12, 2016

Thanks so much to all the contributors for such a useful tool. I use it to convert markdown posts with embedded Latex to HTML for my blog.

chewyshine · on Aug 12, 2016

Awesome productivity tool. Pandoc changed my entire document preparation workflow. Thank you!

wiz21c · on Aug 12, 2016

I like it but it still makes my life hard when I want to use french typography...

FraaJad · on Aug 12, 2016

I know nothing about French typography.. having said that..

pandoc allows you to use different LaTeX templates than the ones it ships with. So, you can take the default templates, add the overrides necessary for French typography and voilà! (hopefully).

RaitoBezarius · on Aug 12, 2016

Do you do --- lang: fr ---

In your document? I find that pandoc + french typography = love as it is using LaTeX under the hood and LaTeX have pretty decent rules for French Typography (better than Microsoft Word IMHO).

posterboy · on Aug 12, 2016

> makes my life harder

You are free to use something else or better

ssebastianj · on Aug 13, 2016

Pandoc just make me happy. Thanks John & contributors for your work!

jesuslop · on Aug 12, 2016

Keep going and enhancing

jackmaney · on Aug 12, 2016

Pandoc is fantastic! Just a few days ago, I used it in a makefile to keep markdown, HTML, and PDF versions of my resume in sync.