Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

By the spec, yes. Some PDF readers will parse it anyway, some will not. In my experience depending on the renderer the xref table can be varying degrees of malformed before things go wrong. Edge's old PDF reader (the one before Acrobat and after PDFium) for example seemed to tolerate just about anything, falling back to the latest version of objects if the xref table was broken. There's also other mistakes you can make, like for example, the xref table requires carriage returns (each entry in the table is supposed to be an exact number of bytes) but some PDF readers will still interpret the xref table even if the carriage returns are missing.


As I understand it, the xref entries don’t require a carriage return, but they require a fixed line length. If you don’t want to use a CR, you can pad with a space.

So CR/LF, space/LF, and space/CR are all valid endings.


Yep:[1]

> The byte offset in the decoded stream shall be a 10-digit number, padded with leading zeros if necessary, giving the number of bytes from the beginning of the file to the beginning of the object. It shall be separated from the generation number by a single SPACE. The generation number shall be a 5-digit number, also padded with leading zeros if necessary. Following the generation number shall be a single SPACE, the keyword n, and a 2-character end-of-line sequence consisting of one of the following: SP CR, SP LF, or CR LF. Thus, the overall length of the entry shall always be exactly 20 bytes

This is interesting. Never actually saw anything other than CRLF in practice, even inside of PDF files that otherwise were LF-only.

[1]: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard... page 41




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: