I'm working on some mechanical engineering software that produces a report that is heavy in tables and equations. Today we are generating html for display and have an option to output the same report in pdf form. We use htmldoc to do the html->pdf conversion. htmldoc is unbelievably fast, it's also quite limited. It doesn't know about Unicode or CSS.
We've tried other more capable converters but they are an order of magnitude slower than htmldoc. I guess Unicode and CSS support is expensive. htmldoc is also better than anything else we've tried at not putting page breaks in the middle of a table.
Would you say this type of report generation is something than Pandoc would be suited for?
I'm a big fan of Pandoc. However, if you have a lot of tables in your document, ASCIIDOCTOR is a better choice to produce PDFs.
ASCIIDOCTOR has clear and superior table formatting syntax and more importantly it can work with CSV files. This way you don't have to copy paste data into your reports.
I wish pandoc was better on converting asciidoctor and not just asciidoc. Asciidoctor and asciidoctor.js is so much nicer than all those markdowns it's not even funny, wish I would have found it earlier so I wouldn't need to work with older markdown files or having to convert them.
I agree that copy-pasting CSV files is undesirable, but there are several Pandoc filters, both in Haskell and Python, that allow you to embed CSVs into tables without much hassle.
So I tried running my file through Pandoc and it failed, so that's kind of a bummer.
Apparently U+3C1 is not set up for use with LaTeX and that seems to be a prerequisite for pdf. It helpfully suggests I try with --latex-engine=xelatex and that fails after a while because xetex.def isn't in the repositories anymore.
I'm not familiar with your exact distro/installation situation, but at least on Debian/Ubuntu, TeX and friends are split into multiple packages, where many binaries (including latex/lualatex/xelatex) are in one package (texlive-binaries), but the supporting files/packages for non-basic uses are in another package (texlive-luatex and texlive-xetex).
As I understand it (and I just started using Pandoc today), pdf output requires TeX (or maybe it's LaTeX) from MiKTeX which I installed. By default, MiKTeX downloads dependencies as they are needed. For me, it dies when it tries to get xelatex.def because it's not on the server.
MiKTeX comes with an admin tool that has an option to synchronize repositories which supposedly fixes lots of problems, but apparently not this one.
For anybody that reads this and decides to get Pandoc and MiKTeX, I suggest getting the version of MiKTeX that bundles all of the dependencies. In the days of terabyte disks, I'm not sure it makes sense to pull down dependencies a few kilobytes at a time, especially if packages periodically go missing from the servers.
This is a side-by-side comparison of the HTML version and the generated PDF version:
We've tried other more capable converters but they are an order of magnitude slower than htmldoc. I guess Unicode and CSS support is expensive. htmldoc is also better than anything else we've tried at not putting page breaks in the middle of a table.
Would you say this type of report generation is something than Pandoc would be suited for?