1. It's a tool built in large part by folks involved in the humanities. John, of course, is a professor of philosophy. (We could stop there, since his contributions so outnumber all the others, but I'll go on...) I'm an assistant prof. of English. It has large contributions from the curator of medieval manuscripts at the British Library, and contributions from a historian as well. And those are just the ones that come to mind -- I'm sure there are others. It's cool to have a tool used by many folks in the humanities, but also in large part driven by their needs.
2. John is a great project lead, and it's an amazingly open project. If you're interested in getting started with haskell, the filters API allows you to really adapt it to your needs. And if you're interested in adding another reader or writer, there's a good chance it can be added.
For myself, I started out using the python docx library, found it didn't handle enough stuff, wrote one myself (covering mainly my conversion needs, not creation) and had it output JSON that pandoc could read. When that worked, I ported it to haskell, and posted it to the mailing list. Within a few weeks, the docx reader was part of the program. It's been a great experience -- both in using haskell, and in playing a significant role in a program I use and love.
Pandoc has made the recent push for reproducible research across the sciences possible (via Rmarkdown/knitr and IPython/Jupyter) --- it's an essential part of solving a pretty big epistemological problem!
(For me, it's also a large part of solving my own productivity problems.)
It's also driven markdown itself to become more feature rich and internally consistent. I'm extremely appreciative of JGM's efforts to product a unified markdown spec[^1].
Amazing piece of software. Not that often a tool dominates an use case so thoroughly that there are virtually no alternatives, because there really is no need. Thank you, John MacFarlane :)
I had a problem converting embedded images and fonts from Markdown to EPUB; this script helps by piggy-backing on the epub-embed-font option to include images:
#!/bin/bash
FILENAME=$(cat title.txt).epub
CSS=style.css
IMAGES_DIR=images
# Ensure images get embedded into the document.
for url in $(grep image $CSS | grep url | sed 's/.*url( *\(.*\) *).*/\1/g'); do
EMBED="$EMBED --epub-embed-font=$IMAGES_DIR/$url"
done
# Generate the ePub, including fonts and images.
pandoc \
--smart \
--epub-cover-image=cover/cover.png \
--epub-metadata=metadata.xml \
--epub-stylesheet=$CSS \
--epub-embed-font=fonts/MyUnderwood.ttf \
$EMBED \
-t epub -o $FILENAME output/*.vars
The .vars files are chapters that have their variable names substituted for values, such as:
for i in chapter/*.md; do
out=$OUTDIR/$(basename $i);
cat variables.yaml $i > $out;
pandoc $out --template $out > $out.vars
cat $out.vars | sed -e 's/;;/\$/g' |\
pandoc --chapters -t context -o $OUTDIR/$(basename $i .md).tex;
done
A handy trick to avoid duplication (of names, places, or any text really) within the content.
Thanks to Pandoc, I never need to use Microsoft Word at work. If someone wants me to upload some documentation to Sharepoint, I just write it in Markdown and convert to docx with Pandoc. Wonderful piece of software.
What an amazing project! I use it all the time, for converting between textile (redmine) and markdown (everything else), and also for extracting my company's proposals from Google Drive, and converting them to latex and PDF with our customized template. The code for this is all here: https://github.com/dergachev/gdocs-export
I love the markdown to PDF converter using embedded LaTeX for math expressions. So much easier than writing pure LaTeX and you still get a beautiful result.
I have started using pandoc again recently since thier org support has gotten a lot better in recent months. Using emacs, it is not a very hard decision to use org mode over markdown or rst.
With org support in pandoc, i no longer have to start up emacs to generate HTML for my website, so doing that in batches just got a lot easier.
Pandoc is one of those things that I only recently discovered and totally fell in love with: I wrote a bunch of code in literate Haskell for work, so that I could have the logic right, and then ported it to JS to reduce the bus factor of the project. When I discovered pandoc I just ran it on my literate Haskell and the resulting PDF was so beautiful I had to print it out and tack it up on a cubicle wall, like a "this is what they pay me for" type of reminder.
I'm working on some mechanical engineering software that produces a report that is heavy in tables and equations. Today we are generating html for display and have an option to output the same report in pdf form. We use htmldoc to do the html->pdf conversion. htmldoc is unbelievably fast, it's also quite limited. It doesn't know about Unicode or CSS.
We've tried other more capable converters but they are an order of magnitude slower than htmldoc. I guess Unicode and CSS support is expensive. htmldoc is also better than anything else we've tried at not putting page breaks in the middle of a table.
Would you say this type of report generation is something than Pandoc would be suited for?
I'm a big fan of Pandoc. However, if you have a lot of tables in your document, ASCIIDOCTOR is a better choice to produce PDFs.
ASCIIDOCTOR has clear and superior table formatting syntax and more importantly it can work with CSV files. This way you don't have to copy paste data into your reports.
I wish pandoc was better on converting asciidoctor and not just asciidoc. Asciidoctor and asciidoctor.js is so much nicer than all those markdowns it's not even funny, wish I would have found it earlier so I wouldn't need to work with older markdown files or having to convert them.
I agree that copy-pasting CSV files is undesirable, but there are several Pandoc filters, both in Haskell and Python, that allow you to embed CSVs into tables without much hassle.
So I tried running my file through Pandoc and it failed, so that's kind of a bummer.
Apparently U+3C1 is not set up for use with LaTeX and that seems to be a prerequisite for pdf. It helpfully suggests I try with --latex-engine=xelatex and that fails after a while because xetex.def isn't in the repositories anymore.
I'm not familiar with your exact distro/installation situation, but at least on Debian/Ubuntu, TeX and friends are split into multiple packages, where many binaries (including latex/lualatex/xelatex) are in one package (texlive-binaries), but the supporting files/packages for non-basic uses are in another package (texlive-luatex and texlive-xetex).
As I understand it (and I just started using Pandoc today), pdf output requires TeX (or maybe it's LaTeX) from MiKTeX which I installed. By default, MiKTeX downloads dependencies as they are needed. For me, it dies when it tries to get xelatex.def because it's not on the server.
MiKTeX comes with an admin tool that has an option to synchronize repositories which supposedly fixes lots of problems, but apparently not this one.
For anybody that reads this and decides to get Pandoc and MiKTeX, I suggest getting the version of MiKTeX that bundles all of the dependencies. In the days of terabyte disks, I'm not sure it makes sense to pull down dependencies a few kilobytes at a time, especially if packages periodically go missing from the servers.
This is a side-by-side comparison of the HTML version and the generated PDF version:
It feels redundant to say this with all the others expressing the same thoughts, but thank you thank you thank you John et al for Pandoc. It's been life-changing since running into it a few years back.
I do a lot of work researching older and historical documents. Some of these are available in only scans, meaning image-heavy PDFs, or rotten OCR transcriptions. Occasionally I get lucky and find a solid ASCII source.
A recent case in point, Dugald Stewart's 1793 "Account of the Life and Writings of Adam Smith", which I'd found in ASCII format.
A couple hours (mostly of finessing the markup) to pandoc, and I had a 72 page PDF, ePub, and HTML versions, the former two posted online.
to me it's the perfect tool to generate user docs and manuals for software and other cli tools, my best use case is markdown (github flavoured) to man, markdown to HTML and PDF, etc.
Happy birthday, I used it for both outputting pdf, and often when encountering a very long html pages without styling (quite common around these parts) to output epub.
I know nothing about French typography.. having said that..
pandoc allows you to use different LaTeX templates than the ones it ships with. So, you can take the default templates, add the overrides necessary for French typography and voilà! (hopefully).
In your document? I find that pandoc + french typography = love as it is using LaTeX under the hood and LaTeX have pretty decent rules for French Typography (better than Microsoft Word IMHO).
1. It's a tool built in large part by folks involved in the humanities. John, of course, is a professor of philosophy. (We could stop there, since his contributions so outnumber all the others, but I'll go on...) I'm an assistant prof. of English. It has large contributions from the curator of medieval manuscripts at the British Library, and contributions from a historian as well. And those are just the ones that come to mind -- I'm sure there are others. It's cool to have a tool used by many folks in the humanities, but also in large part driven by their needs.
2. John is a great project lead, and it's an amazingly open project. If you're interested in getting started with haskell, the filters API allows you to really adapt it to your needs. And if you're interested in adding another reader or writer, there's a good chance it can be added.
For myself, I started out using the python docx library, found it didn't handle enough stuff, wrote one myself (covering mainly my conversion needs, not creation) and had it output JSON that pandoc could read. When that worked, I ported it to haskell, and posted it to the mailing list. Within a few weeks, the docx reader was part of the program. It's been a great experience -- both in using haskell, and in playing a significant role in a program I use and love.