QuestPDF: Modern .NET library for PDF document generation

petilon · on Jan 18, 2023

The hard part of PDF generation is support for complex script (Arabic, Indian languages etc.), including embedding a font subset. On Windows this is usually accomplished using the Uniscribe library (which is not available on Linux). QuestPDF appears to be using HarfBuzz for this purpose. If that works well then this is a winner!

xoac · on Jan 18, 2023

HarfBuzz is the gold standard.

petilon · on Jan 19, 2023

Gold standard? Even though serious bugs are not fixed [1] because "the code is too fragile to touch at this point"? Looks like Android uses HarfBuzz, if so it can't be that bad.

[1] https://github.com/harfbuzz/harfbuzz/issues/2814

aliswe · on Jan 18, 2023

Just a random fact, Harf means "letter" in arabic

space_ghost · on Jan 18, 2023

Oh man, .NET needed this. Did some PDF work for a .NET project last year and found the ecosystem to be somewhat light on PDF support. There's a few commercial options, but they're pricey.

ejb999 · on Jan 18, 2023

I've used this one with great success (and its free): https://github.com/tuespetre/TuesPechkin

its basically a wrapper for wkhtmltopdf but I develop an app that has probably generated a million +/- invoices/statements over the past 5 years with it, and its been rock solid for me. Was a bit of a bear to get it working the first time (not a ton of documentation that I could find at the time), but once working, was easy to add/change new documents/layouts.

As it uses wkhtmltopdf under to covers, it is a HTML->PDF tool, but I prefer that, at least for my use case.

Not sure there is a dotnet-core version, so that might be a problem for some.

hbcondo714 · on Jan 18, 2023

I've been using Rotativa[1] for URL to PDF generation which is also a wrapper for wkhtmltopdf. They have a dotnet-core[2] version and also a SaaS[3] but it's worth mentioning that Azure PaaS supports wkhtmltopdf[4] so I just self-host.

Looking at QuestPDF's API docs, it doesn't look like they support URL / HTML to PDF generation. I think this would be a great addition especially given the age and issues with Rotativa and TuesPechkin on their public repos.

[1] https://github.com/webgio/Rotativa

[2] https://github.com/webgio/Rotativa.AspNetCore

[3] https://rotativa.io/

[4] https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandb...

yyyk · on Jan 19, 2023

The various wkhtmltopdf wrappers (there are many for .NET) work, but note wkhtmltopdf itself (based on Qt 4/5) is abandoned and the general response for issues was always 'just hack the HTML until it sorta works'.

dukeofharen · on Jan 19, 2023

We've been using [Gotenberg](https://gotenberg.dev/) since a few years and so far works really nice.

majkinetor · on Jan 19, 2023

Can you say more about your typical setup?

pathartl · on Jan 18, 2023

We've been on iTextSharp 5 for a decade for this reason.

sixtram · on Jan 19, 2023

We're using Puppeteer with Chrome. It's easy to test, as basically it runs Chrome's Print to PDF. Difficult to work with headers and footers and page breaks can be tricky, but it could work for a lot of layouts.

mattferderer · on Jan 18, 2023

This looks awesome! The .NET foundation I hope supports this ASAP - https://old.dotnetfoundation.org/projects/

I don't know if it's due to "partnerships" but I never could understand why Microsoft didn't do better at supporting .NET Word & PDF tooling since .NET Core came out. The older versions I know at least had support for Word docs. Creating documents is a huge foundation of their company.

password4321 · on Jan 18, 2023

Careful consideration should be taken before joining the .NET Foundation.

How the .NET Foundation kerfuffle became a brouhaha

2021-10-08 https://news.ycombinator.com/item?id=28794352#28795511

> the project had now been silently moved to GitHub Enterprise (likely in the short window @dnfadmin had owner access). The author states that projects in GitHub Enterprise can be entirely controlled by the owner of the account (the .NET Foundation). This transfer happened silently.

matchagaucho · on Jan 18, 2023

The Adobe vs Microsoft competitive relationship is indeed a puzzle.

Some features, like eSignature, there's more co-opetition and partnering.

bilekas · on Jan 19, 2023

The eSig is a key component. Need to be extended outside of AdoC.

dustymcp · on Jan 18, 2023

I think alot of people have moved on and are using services or puppeteer to generate their pdf's i know we did since we couldnt find a library that worked properly for our usecase.

paranoidrobot · on Jan 18, 2023

the tl;dr of using Puppeteer for this is "We run Chrome in a headless mode, load your page, and then print to PDF with it".

It makes me nervous having Chrome running on the server, even inside a container without root. Doubly so if the user is able to control any portion of the page being run by Chrome.

whitexn--g28h · on Jan 19, 2023

We used aws lambdas to execute the pdf renders and upload the result to s3 using a signed url passed in from the request. Complete insulation from our own application process, all of data is passed into the request so the worst the user can do is add a malicious file to our s3 bucket.

paranoidrobot · on Jan 19, 2023

Are you using Puppeteer for the PDF Rendering?

Chrome is pretty horrific in terms of memory usage. How are you handling the startup time + memory usage? (with their associated costs)

mrwizrd · on Jan 18, 2023

This looks great. I am glad to still see good work being done in this space.

I had used https://gotenberg.dev/ on AWS in the past. Many of the options available at the time weren't usable in Azure outside of a VM due to needing to make use of GDI interfaces that were disabled for security reasons. Interested to see how it compares to that and the other options being floated at the time like Puppeteer*

jiggawatts · on Jan 18, 2023

Containers can use the full Windows Server base image, which includes the GDI+ libraries.

I used this as a trick for making Crystal Reports work.

mwcampbell · on Jan 18, 2023

I don't see any support for tagged PDF output. That's important for accessibility, particularly for screen reader users.

nateb2022 · on Jan 18, 2023

Good point! There's an open issue regarding that, and it seems to be due to the fact that under the hood, QuestPDF uses Skia which itself lacks support for tagged PDF's: https://github.com/QuestPDF/QuestPDF/issues/193

qwertox · on Jan 18, 2023

This could be a SkiaSharp limitation. This thread made me interested in Skia and I started looking around their site and did a quick search for "tagged PDF" on their Milestone Release Notes.

If they understand the same thing with tagged PDF as what is being discussed in this thread, that page says that "Add new APIs to add attributes to document structure node when creating a tagged PDF.", which could be a milestone as old as of 2020 [0]

[0] https://skia.org/docs/user/release/release_notes/#milestone-...

edragoev · on Jan 18, 2023

The .NET version of PDFjet supports tagged PDFs:

https://github.com/edragoev1/pdfjet

so do the Java, Swift and Go versions.

While the docs are somewhat sparse, Example_45 shows how to create PDF/UA compliant PDF.

BaRRaKID · on Jan 19, 2023

Are people really OK with writing code in C# for data presentation? How is it fine having to compile code when you want to change a font colour for example?

And just from looking at the examples you can see how quickly it becomes messy. It looks like something that would work fine for very simple document structures but would get messy really fast once you add any level of complexity. Imagine having to write code for a document with multiple levels of headings, tables and images with captions, parts of text that need emphasis, etc.

From my experience the best solution for generating PDFs in .NET are, as mentioned here before, wkhtmltopdf wrappers. Take a Razor view, add some document properties (page size, margins, headers, footers, etc) in code, and output to a PDF.

I would even argue that on desktop using Office APIs to populate word documents and output them to PDF is more efficient than this.

MarcinZiabek · on Jan 19, 2023

The concept of writing the presentation layer in the actual programming language isn't new. Flutter developed by Google is a great example.

It all depends on how you structure your code. Of course, you can create huge chunks of HTML+CSS that cover your entire document. Or, to improve maintainability, you can split that HTML into multiple components and compose them together.

Very similar rules can be applied to this library, where you split your code into smaller parts by using properly named methods. That gives you better understandding on the structure and ways to traverse the implementation.

Futhermore, using a programming language gives you many benefits. You can rely on all language features like loops, conditions, methods, formatting, recursion, etc. You have access to IntelliSense, static analysis and refectoring tools.

In terms of changing the layout content - it would be equally difficult when using any markup language. After all, you don't want to parse markup file every time you generate PDF - for performance reasons.

HTML may be a good choice for relatively simple documents, where you don't care that much about splitting content between pages, etc. However, you still will fight with performance problems related to running entire web browser.

gibsonf1 · on Jan 18, 2023

We've been really happy with https://pdfbox.apache.org/ in production, although not .net of course.

Scarbutt · on Jan 18, 2023

you have to add table support yourself.

gibsonf1 · on Jan 19, 2023

Thats true, but to do that well given what's out there, you have to do it yourself.

majkinetor · on Jan 19, 2023

I am using this in prodution instead of Dink2PDF since recently, generating 30K PDF daily (in tens of minutes, with multiple threads). It works great, highly recommended.

Dink2PDF was crashing monthly in this scenario due to internal unmanaged memory problem so we had to replace it. Not to mention, HTML to PDF libraries are insecure, and dink is no exception - you can execute arbitrary code on the server. Not to mention that you need to have full browser engine in your app...

yyyk · on Jan 19, 2023

Crashing is a problem with wrappers for wkhtmltopdf library when they run concurrently (you need the right locks, some wrappers have this but not dink apparently?). However, process-based wkhtmltopdf wrappers are also available and don't have this problem at all (e.g. nreco. It's also not too difficult to create a wrapper for oneself).

majkinetor · on Jan 19, 2023

We had very funky case - the crash happened couple of times per month only, with web app doing daily work via REST API producing very large collection of pdfs on single API request, each having from couple to 5k pages. It was very hard to troublehsoot and pinpoint memory corruption problem (took us several months, one doesn't expect this in C# app). Finally we switched to QuestPDF. Surprisingly, performance was exactly the same as with dink (which is surprising); I see in recent changelog 50% speed up on textual PDFs so this might be a game changer (TBH performance is great even now).

BTW, we used dink for half decade in public web apps with millions of users. Dink was used to create PDF reports and we never had a crash in that concurent scenario. However, when we started doing typical non-web multithreading this started to happen.

ripberge · on Jan 18, 2023

Does this allow you to use existing PDFs as "templates"? We do that a lot with PDFs. It allows end users to design in Adobe Acrobat and upload to our product. We can then inject dynamic data into placeholders at runtime. We do this for text and images.

nateb2022 · on Jan 18, 2023

Not currently; however, there's an open issue regarding this very topic: https://github.com/QuestPDF/QuestPDF/issues/283

phpdave11 · on Jan 18, 2023

It shouldn't be too difficult to add support for this. I authored a Go library which adds support for importing PDFs into a new PDF generator (either gofpdf or gopdf). It is around 2,500 lines of code: https://github.com/phpdave11/gofpdi

ahmedatia · on Jan 19, 2023

Not at the moment. We are currently using RadPdfProcessing [1] from Telerik ($) to do so. It is a processing library that allows creation, import and export of PDF documents from code.

[1] https://docs.telerik.com/devtools/document-processing/librar...

PYTHONDJANGO · on Jan 19, 2023

What are you using to do that? Thanks!

phonon · on Jan 18, 2023

Do you mean Acroforms?

lazyeye · on Jan 18, 2023

Why you would use anything other than a HTML -> PDF rendering engine is beyond me.

mattferderer · on Jan 18, 2023

Because styling semi-complicated PDFs with CSS is a layer of hell right above e-mails & old browsers. I say this as someone who enjoys CSS & in the .NET world has used this method over things like Crystal Reports (even before they dropped their .NET support).

lazyeye · on Jan 18, 2023

I havent found this at all. Ive been rendering very complex html to pdf (complex svg charts, headers, footers etc) and its been fine. Just a matter of getting the element heights/widths correct. Once you've got the basic page template done its not much effort at all to tweak as required

renaudl_ · on Jan 18, 2023

Have you heard of Paged.js ?

trynewideas · on Jan 18, 2023

There's a good matrix of feature support at https://print-css.rocks/lessons for all the things HTML-to-PDF engines can (and can't) do.

The CSS3 Paged Media spec was born broken on some fundamental things like counter resets, then effectively abandoned in 2013, so some complex print-specific requirements like fully customizable page numbering just don't happen without additional tooling. Accessible tagged PDFs are still a struggle, and I think only Weasyprint readily supports them among free or open-source options (and only since around September).

lazyeye · on Jan 18, 2023

Here's some PHP code to handle page numbering :-

<?php

$pagenum = 1;

$pagenum++;

print("Page: $pagenum");

?>

recursive · on Jan 19, 2023

That prints "Page: 2". It doesn't appear anything related to pagination or layout.

lazyeye · on Jan 19, 2023

Well obviously you'd have that code on each page. So thats the incredibly complex and challenging page numbering problem solved.

As far as pagination is concerned I wrap each page in this tag

   <page size="A4">  </page>

with this css:-

page[size = 'A4'] { height: 29.7cm; width: 21cm; page-break-before: always; }

    page[size = 'A4'][layout = 'portrait'] {
        height: 21cm;
        width: 29.7cm;
    }

So that's the incredibly complex and challenging problem of pagination solved.

And fortunately html does layout too.

recursive · on Jan 19, 2023

You're just pushing complexity around. Now you have to figure out how much content you can put on each page before you have to make a new one. The reason it seemed so easy is that you just deferred the hard part.

lazyeye · on Jan 19, 2023

Nope. What you call the "hard part" is

if (rowcount > 100) newpage();

But lets use an entirely new, esoteric PDF specific layout language instead and we wont call that the "hard part".

Actually to be fair it obviously depends on the situation. If you are churning out large volumes of PDFs then it might make sense to get the efficiency with a PDF-specific language. But HTML -> PDF definitely has its place too and is not as hard to work with as people are claiming.

recursive · on Jan 19, 2023

You did it again. What's a row? Are you telling me that 100 of them will fit on any page, regardless of font size or margins?

lazyeye · on Jan 19, 2023

Why would you be randomly changing the font-size or margin? Surely that would be known in advance?

recursive · on Jan 19, 2023

Sometimes they might be. But even if you know the font size of the headers, the contents, and the bulleted lists, you still won't know where the paragraphs wrap, unless you also require a fixed-width font, and don't allow any of the word lengths to change. But then we probably wouldn't be calling it a "modern library for PDF generation".

magnat · on Jan 18, 2023

For a proper page numbering, consistent word wrapping, pixel-perfect font rendering and document outline support.

lazyeye · on Jan 18, 2023

You cant do page numbering in html? Seriously? I haven't had an issue with any of these things rendering html to pdf.

trynewideas · on Jan 18, 2023

The Paged Media spec on counters and counter-resets paints implementations into a corner. They can't both comply with the spec and implement page count resets on page breaks. This has been a known issue with the spec since 2013[1][2] and been a thorn in implementations since.[3]

1: https://www.w3.org/Style/CSS/Tracker/issues/334

2: https://github.com/w3c/csswg-drafts/issues/4760

3: https://github.com/Kozea/WeasyPrint/issues/93#issuecomment-4...

lazyeye · on Jan 18, 2023

Here's some PHP code to handle page numbering (page breaks are no problem at all rendering html to pdf) :-

<?php

$pagenum = 1;

$pagenum++;

print("Page: $pagenum");

?>

criddell · on Jan 19, 2023

I’ve had trouble with html -> pdf when it comes to tables. Most of the packages out there don’t have a way to say “if only the first few rows of the table will render before a page-break, then put the page-break before the table”. Or getting a table to not break at row X when that row has a cell that spans more than one row. Or getting table headings to repeat on the next page when a page-break mid-table is unavoidable.

These are all things that good page layout software can deal with easily.

HTML to pdf is also pretty slow and unreliable when you want a table of contents or an index with page numbers. On a fast machine, a 200 page PDF can take several minutes to generate. PrinceXML is the only software I’ve tried that does a good job of it. For very simple documents (no CSS, limited Unicode) HTMLDoc is pretty good and very fast.

lazyeye · on Jan 19, 2023

I agree with html you have to manage the pagebreak within the code but honestly its not that complicated to insert a pagebreak based on rowcount.

paranoidrobot · on Jan 19, 2023

Row count is not the only condition for how much can fit on a page.

Content matters for row height, as does fonts, styling, etc. Footers, too. The number of footnotes (thus the height of the footer) can depend on the page content, too.

Maybe there's an image in there, or some sort of specially styled content that makes the line height larger than normal.

You have to be able to render the row to know the final dimensions, to be able to make a call whether it should go on that page or the next. If you don't, then you end up with one page actually rendering over into a second page.

password4321 · on Jan 18, 2023

Is there an open source pure-.NET library that implements this?

I've heard good things about https://github.com/Kozea/WeasyPrint in Python.

kodt · on Jan 18, 2023

Filling a fillable PDF programmatically?

xupybd · on Jan 18, 2023

I still find jasper reports the best in pdf generation. Jasper studio gives you okay design tools. Much better than hand coding. Jasper server means integration is as simple as a rest interface. The community edition seems to do everything I need.

yupis · on Jan 18, 2023

Smart people of HN why can't I directly edit a PDF text file and change some letters?

KMag · on Jan 18, 2023

A PDF file is a program for a virtual machine that draws characters. For instance, I believe fonts in PDF work like PostScript fonts, where (for left-to-right languages) each glyph in the font is actually a bytecode function that starts with the brush in the lower-left corner of where the glyph is to be drawn, draws the glyph, and leaves the brush at the lower-left corner of where the next glyph is to be drawn. I think it's somewhat similar to turtle graphics, if you're familiar with Logo programming or G-code if you've ever hand-coded a CNC mill. (PostScript is text instead of bytecode. PDF is an odd mix of a binary and text format, which helps explain why it has had so many parsing security vulnerabilities over the years.)

For common cases, it may be possible to basically decompile the PDF, modify the text, and re-flow the text, and re-compile to bytecode. However, it's very complicated to do in the general case. (Note that in HTML, the browser determines how to best layout the text, but with PDF, the PDF generator makes the layout decisions.)

Also, many PDF renderers will "compress" fonts by lazily building up an embedded font as glyphs are used in the document. These typically will assign "a" to the first glyph used "b" to the second, etc., so if you decompile "This is some text", you'll see "abcd cd defg hgih". Some PDF generators will helpfully annotate the generated text with "backing text" metadata to help screen readers/copying-to-clipboard, but it's far from universal. So, you might need a database of hashes of all of the bytecode functions in a large number of fonts and/or some image-to-text software in order to reliably decompile the PDF.

If you're unable to copy text out of a PDF or you get gibberish when you copy text from the PDF, it's likely because the PDF lacks this "backing text" metadata (and in the gibberish case, likely a compressed embedded font). Some scanners will helpfully perform OCR to add this backing text metadata to the generated PDF.

Source: I did a small amount of work related to PDF analysis in Google's web search indexing pipeline over a decade ago. Most of my work was related to figuring out how JavaScript altered web page text, but I did learn just enough about PDF to be dangerous. At the time, Yahoo was Google's biggest competitor, and tons of their indexed PDFs had preview text that was this compressed font "abcd cd de..." garbage. Yahoo obviously naively decompiled the PDF and just trusted that "a" in the embedded font was a bytecode function that drew the glyph "a".

kccqzy · on Jan 19, 2023

Maybe the letter you intend to add is not part of the subsetted font. Font subsetting is extremely common.

Maybe out of coincidence all the letters are present. Then you'd have to deal with manually adjusting the spaces and reflow the text. Reflowing the text can be done, but cumbersome. It's akin to fixing a bug in program not by changing the source and recompiling, but by binary patching.

In contrast, it's much easier to delete some letters in the PDF and keep everything else in the same place. In fact I've had obvious PDFs that have a copyright notice on every page. Deleting that can be done with qpdf and just vim (basically deleting the Tj or TJ operators).

This is fascinating. I recommend you read the PDF specification.

bazoom42 · on Jan 18, 2023

You can, using a tool like Adobe Acrobat. But a PDF is a fixed layout, where each line of text is a positioned box. So editing text will not cause reflow across lines.

Scarbutt · on Jan 18, 2023

Not knowing anything about PDF generation (but will need to soon), what can these libraries do that you can't do with something like a puppeteer web service and create PDFs with HTML/CSS?

pathartl · on Jan 18, 2023

Using HTML/CSS for PDFs really just isn't a good idea in my experience. It makes layout extremely cumbersome. If you just need to spit some data out onto a page, sure it works I guess. However, doing more complex page layout with an actual design element often introduces scenarios where a markup language just can't work.

px1999 · on Jan 18, 2023

Scale/performance. The interface is also straightforward to use. Puppeteer or any nonembedded process is just unnecessary hassle/overhead in a lot of cases.

bayesian_horse · on Jan 18, 2023

It's an overhead but not a big one, at least for web applications, especially if they run as containers anyway. And then it really scales like crazy. Yes, this pdf generator may be faster at what it does, but a headless browser with paged media polyfill can do a lot more than this and uses html+css which are widely used standards.

px1999 · on Jan 18, 2023

Sure, but as others have said, how do you get column headers appearing on each page, put metadata into your documents, make elements properly selectable etc etc

"Just run it as a container" is a bit of an industry cop-out for making stuff unnecessarily complex.

bayesian_horse · on Jan 19, 2023

THEAD is supposed to do that already. You can use hyperlinks. You can postprocess pdf files.

ficklepickle · on Jan 18, 2023

I put puppeteer into a serverless function and it worked well enough for low tens of thousands of PDFs a day. It's not fast, nor efficient, but it was reliable and surprisingly cheap. It was a definite improvement over the existing solution which was a terrible proprietary black box that was occasionally returning the wrong invoice, but that is not saying much. It was an easy drop-in replacement because we were already generating invoices in HTML, so we just sent them to the new PDF service instead.

Something like this is likely much more efficient than launching a whole browser for each PDF.

bigtex · on Jan 18, 2023

Tables over multiple pages is a major problem. It just doesn’t work with the popular htmlpdf tool that everyone uses to power their tools. That is the use case I am interested in.

msk-lywenn · on Jan 18, 2023

What more this does that pdfsharp doesn’t ?

marpstar · on Jan 19, 2023

I see a Previewer utility of some sort: https://www.questpdf.com/document-previewer

I've used PdfSharp in the past to generate product spec sheets in bulk. It worked fine. This one seems more focused on .NET 6 and modernity.

wackget · on Jan 18, 2023

>> You are 250 lines of C# code away from creating a fully functional PDF invoice implementation.

As a web developer this hurt to read. This is a task which is just crying out for a markup language and a stylesheet, not hundreds of lines of declarative C# code.

Even the "complex example" in their documentation looks like the most basic of web pages.

pathartl · on Jan 18, 2023

Coming from the web dev space into backend on a project that heavily relies on PDF generation, I would say that something like a PDF often cannot be expressed with just markup and a stylesheet. There's a large difference in something like the web (it must be expressed with some fluidity of layout) compared to a very static document like a PDF. Page breaks, readability, print supply, watermarks, paging, etc all has to be considered.

aidos · on Jan 18, 2023

I feel like all of that can be done in markup.

kgwxd · on Jan 18, 2023

I've worked with PDF markup tools built on libriaries like this for 20 years, both third-party and in-house custom. It usually takes 10 minutes to find out the markup doesn't support what is required for the task. Third-party you have to find a hack or drop it all together. In-house you can maybe add something in, but you'll have to do it fast, and if you can't break it down into a general-purpose feature (which you probably can't because the fundamental philosophy of your "easy" markup language wasn't designed with anything like this in mind) so you'll just have to uglify the markup language even more or, again, drop it all together.

Code is the only sane way.

aidos · on Jan 18, 2023

Well sure, as ever, it depends on your usecase.

PDF is an insanely complex spec (I’ve spent more time reading it than most because I need to know bits of it for my job and I just generally find it fascinating). But a lot of devs just need to put some content on the screen to match a template they were given. In my experience, a complete enough markup language allows you to bang out and maintain those templates better than code.

I know it doesn’t suit every need, but it’s just a way of representing the data so it’s closer to the final output than imperative code is. Definitely take your point though about the limitations becoming dealbreakers.

MarcinZiabek · on Jan 19, 2023

What if the code resembles the markup language in terms of readability, but still gives you access to more advanced features? Surely, there is space for various approaches, it all depends on your task and requirements

layer8 · on Jan 18, 2023

It can, but it’ll become something complex and Turing-complete like LaTeX.

styx31 · on Jan 18, 2023

Webpages and pdf (paged documents) are fundamentally different, you won't be able to support easily headers and footers, page-breaks and orphans on a webpage. You can create basic invoices on webpages, but anything more complex (and by that I mean any serious word document) will require you to twist HTML. Try to have column headers to repeat on each printed page on a HTML page.

aidos · on Jan 18, 2023

The markup doesn’t need to be html - and would be better not to be. The point is more that templating languages are great for formatting data as markup and markup is great for driving layout. With this library as a backend you can make something super usable.

SigmundA · on Jan 18, 2023

I believe browsers have been repeating table headers on printed output for some time.

Page media CSS is designed for this although most browsers don't fully support it, PrinceXML is the go to for full paged media support.

IMO they are not fundamentally different, they are both document formats, PDF just a has fixed paged rendering layout baked in while HTML can flow and adjust to rendering target. The main issue is lack of full print CSS support in HTML rendering engines.

https://www.w3.org/TR/css-page-3/

https://www.princexml.com

styx31 · on Jan 18, 2023

You are right about the thead repeatable header.

Still, to switch back to the previous point, it seems it's more a divergence between using markup or code to design a document. Both have valid usage and benefits depending on your case.

In my case and my apps, I often need to handle complex conditions that fits better imo in procedural code (complex invoices and agreements). On other cases (reports), I prefer to use a markup language.

SigmundA · on Jan 18, 2023

There are a lot of procedural tools for generating HTML, lots, if modern browsers fully supported print CSS then you could use them for complex PDF generation, or direct printing, either client side or on the server headless.

If your app is a web app this is a no brainer, the users browser could simply do the print or PDF conversion as needed.

I do see a use for more direct libraries in native apps, although if every native client had a browser control with full print CSS support even then it might not be such an issue.

Scarbutt · on Jan 18, 2023

If your app is a web app this is a no brainer, the users browser could simply do the print or PDF conversion as needed.

That's arguable, IME (and also a better UX), most would prefer to just get the PDF file which just one click than to deal with additional browser dialogs. No everyone knows how to do print-to-pdf or even know it exists.

Or do you mean browsers expose print-to-pdf functionality as an API?

SigmundA · on Jan 18, 2023

Hitting print in the browser or calling Window.print() if you want to force the dialog.

If you serve a PDF you still need to hit print or use dialog to save, you can use a headless browser server side to serve that if needed.

I do think browser could use better print API's but you not getting around that with server side PDF's unless the server direct prints to on site printers or something.

MarcinZiabek · on Jan 19, 2023

I am not sure if it is a good idea to think about webpage and PDF content as the same. After all, they both serve different purpose and their layout shouldd be optimized for the use case.

lazyeye · on Jan 18, 2023

None of these things are difficult at all with html. Plus you have the benefit of having the document viewable in a web browser too. You use the exact same html layout for both with specific css (heights, widths mainly) for each.

bob1029 · on Jan 18, 2023

We do a lot of dynamic report gen PDFs and this is something we'd prefer.

Right now, we basically emulate this technique w/ HTML->PDF. We build chunks of report HTML with various string interpolation methods and then compose those to obtain our final HTML output.

Raw, declarative HTML is nice if you don't have an undefined # of things to describe with it. When you are looping and projecting domain types into a report, things get a lot trickier.

amithegde · on Jan 18, 2023

I used https://github.com/Antaris/RazorEngine to generate all sorts of complex HTML, email body etc. back in the day. Since it follows razor syntax, loops etc. work well

bob1029 · on Jan 18, 2023

We actually used this exact library at one point, but it fell out of favor for some reason I cannot recall.

jaywalk · on Jan 18, 2023

There are PDF generators that work just like that. As a web developer who uses C# on the backend, QuestPDF is exactly what I want.

naasking · on Jan 18, 2023

Will those basic web pages be less than 250 lines for the equivalent look? I'm skeptical.

MarcinZiabek · on Jan 19, 2023

There are many good reasons of choosing the programming language over a markup language. C# has countless of features, both functional and syntactic: conditions, loops, methods, formatting, iteration, recursion, etc. Additionally, each of those features is well supported by all major IDEs. Writing your presentation layer in a proper programming language does not only rely on your existing skills but also gives you access to tools such us code completion and IntelliSense. Moreover, using FluentAPI helps with keeping the code concise and easy to change.

At the end of the day, it all depends on how you use the technology, doesn't it?

password4321 · on Jan 18, 2023

https://docs.aspose.com/pdf/net/working-with-xml/

Starting at $3600 for use on a web server.

aidos · on Jan 18, 2023

Not familiar with .Net but I’d imagine this would probably be fairly easy to build on top of this library (and I agree, xml is often a much better way to generate reports).

I’ve done something similar but in Python and generating Excel documents. I use jinja for templating to create the xml and then parse that and convert to commands that drive the library that creates the final document.

Genmutant · on Jan 19, 2023

You can use XSLT to XSL-FO if you want that. I haven't found it very nice to use.

wvenable · on Jan 18, 2023

But then you need a template language to generate the markup from the data.

aidos · on Jan 18, 2023

That's a well trodden path in most languages. A cursory search surfaced this library that looks like it would probably do the job:

https://github.com/scriban/scriban

wvenable · on Jan 18, 2023

A programming language referencing a template language library for processing a markup language to generate another markup language (PDF) sounds just about right.

aidos · on Jan 18, 2023

Nitpick but it’s a stretch to classify PDF as a markup language. They’re a graph of nodes that can encapsulate myriad different types of data including things that are probably even turing complete like fonts. Even the graphics streams inside PDFs aren’t markup.

We build abstractions for a reason. I think we can all agree that templating markup for layouts has been a reasonable success story of the web generation.

Charlie_26 · on Jan 19, 2023

I’m planning on using Forms9Patch to get this working for a Xamarin project.

isralcduke · on Jan 19, 2023

Does it help create tagged PDFs? Tagged PDFs are necessary to be accessible

aargh_aargh · on Jan 19, 2023

How would I go about using this from PowerShell?

juki · on Jan 19, 2023

You could just build the example in C#, grab all the required dlls from it and then load them in PowerShell with `Add-Type -Path '.\QuestPDF.dll'`.

Unfortunately it looks like this uses extension methods for everything, and those are a pain to use in PowerShell. You'll probably want to write the PDF creation bits as a C# cmdlet instead.

majkinetor · on Jan 19, 2023

You can even compile C# code in it, so it could be direct copy paste with some boilerplate.

renaudl_ · on Jan 18, 2023

Just why aren’t you using an api like doppio.sh ?

paranoidrobot · on Jan 18, 2023

Because it's an external paid service.

Some people don't want, or can't use, external services like this. Security, Privacy, availability, cost... plenty of reasons.

A library works offline, too.

hacknewslogin · on Jan 18, 2023

Very cool, I'm looking to learn HTML, CSS, C and someday forth. Does anyone know if there's anything like this for those languages?

cinntaile · on Jan 18, 2023

I think you're in the wrong thread?

hacknewslogin · on Jan 18, 2023

This is for making PDFs using C# code, right? And you can preview it while you work? I was wondering if that is available for other programming languages.

xupybd · on Jan 18, 2023

https://en.m.wikipedia.org/wiki/JasperReports

https://en.m.wikipedia.org/wiki/Crystal_Reports

https://pypi.org/project/pdf-reports/

hacknewslogin · on Jan 19, 2023

Thanks!

zzo38computer · on Jan 18, 2023

Since you mention Forth, I might mention that PostScript is another stack-based programming language (different than Forth although there are some similarities), which can be used to make PDF output. Additional PostScript codes could be made which you can load into your file in order to add additional procedures, etc for doing formatting that you will not need to write by yourself.

KRAKRISMOTT · on Jan 18, 2023

>and someday forth

PostScript is eternal

yodon · on Jan 18, 2023

See Poe's Law [0]

[0]https://en.wikipedia.org/wiki/Poe%27s_law

px1999 · on Jan 18, 2023

Oh, I get it! you're saying that .net isn't a good language. haha great one.

imafish · on Jan 18, 2023

Is you Bot?

vxNsr · on Jan 19, 2023

no he's a relatively new account trying to farm some karma. he's still learning

hacknewslogin · on Jan 23, 2023

I don't care about Karma at all, I really was just trying to ask a question. I'm interested in learning HTML and CSS. I'm also interested in creating PDFS. My question was since this looks like it's for making PDFs using C# code, and you can preview it while you work, I was wondering if that is available for other programming languages. Besides HTML and CSS, someday I want to learn C and FORTH. I thought this would be a practical way to get familiar with programming languages.

hacknewslogin · on Jan 18, 2023

No, this is patrick!

BlueNorth · on Jan 19, 2023

We must, one day, realize that .NET will disappear when Microsoft stops supporting it. All work done with this framework is only a prison for future developers. This must end.

atraac · on Jan 19, 2023

First of all, what does it have to do with this particular library?

Second, get a grip, .NET is open source for a while, it's getting more popular again, JetBrains entered the stage with a very competetive IDE, it's easier than ever to use .NET via CLI, days of having to use Visual Studio, or any Microsoft product really, are long gone. I don't like Microsoft either but noone's forcing you to use any of their products anymore.