Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

...What are you talking about? HTML files are readable on basically every platform, even moreso because they are fundamentally text files (unlike PDFs, which are binaries). PDFs need special software, html can be read on the command line. Likewise, HTML is dead simple to edit and annotate.

Seriously, name a single device that has PDF support that doesn't allow you to view HTML.

I think you're conflating "html" and "things stored on a server", because all of your objections apply to pdfs stored on a server. The ability to save and annotate pdfs is not an inherent feature of the file format, they exist because the format is such a PITA to interact with that specialized programs have to be written. HTML can be saved just as easily, and usually is (on archive.org).



I just tested saving https://browse.arxiv.org/html/2312.12451v1 to disk using Chrome, transferring it to my Android phone, and opening it on the phone. Results:

1. Saving as "Webpage, Single File" (.mhtml): Neither Firefox nor Chrome even showed up in the list of available apps to open it.

2. Saving as "Webpage, Complete": Opened in Chrome but images were broken. Also very difficult to open with the default file browser because it uses a flat folder view and the sidecar folder pollutes the file list.

I was hoping this would work, perhaps you will have different findings. I agree that HTML is the superior format in theory but usability in practice is often lacking. I'm resigned to using both depending on context.


Yes, that's the kind of issue I was talking about. I wish it were otherwise. As a nearby comment pointed out, epub is a potential solution (and I wish arXiv embraced it - without my knowing their other requirements or epub's accessibility features). It's essentially packaged html.


Of course, they’re “just text files” only in theory… but theory and practice diverge very very often.


How do I save an HTML document locally, and annotate it, in an easily sharable form, and in a form that is stable - i.e., in a way that will be readable and useable in 20-50 years?


Basically any HTML document from 20-30 years ago (can't go any further because it didn't exist 50 years ago) will be completely readable and usable. The only issue is people creating content (not styling) in formats besides HTML.

As far as annotations, you can use the native <ruby>[1] tag, or strikethough, but if you mean "literally drawing on the text" then, yeah, you're looking for an image format at that point (which is fundamentally what PDF is), but we shouldn't default to storing text in image formats just because of one specific use case. (Also, as I said above, the only reason tools exist to easily do that in PDFs exist is because everyone insists on using a format that's hard to edit. )

Also, note that the context I was responding to was US legal documents, not something more presentation-heavy.

[1]https://twitter.com/antumbral/status/1730829756013375875


You say it as if pdf is somehow better. To begin with it's a proprietary format. If Adobe goes bankrupt or obscure tomorrow, pdf will go out of use as a failed technology.


> it's a proprietary format. If Adobe goes bankrupt or obscure tomorrow, pdf will go out of use as a failed technology.

It's an ISO standard with a very large ecosystem outside Adobe. Many users and businesses I know don't use Adobe at all.


They will use it, like COBOL. But are COBOL programs usable on your machine?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: