A transpiler that gets confused when optional tags are missing (a feature explic...

still_grokking · on Jan 9, 2021

Yeah, fine. The tool is broken. Right.

But YOU have still an issue.

The Point is: There a lot of broken tools out there, and you can't know which of them will be used in the future. Just avoid a lot of headaches for your future self and your colleges by not testing out the spec-compliance of all those tools you'll probably use at some point.

susam · on Jan 10, 2021

> But YOU have still an issue.

I disagree.

> There a lot of broken tools out there

A tool that incorrectly handles optional tags may handle other parts of the spec incorrectly too. Such a tool may provide incorrect results for even perfectly well-written HTML. There is no know what it takes to make all the broken parsers out there happy.

I know you made a point about ETL tools[1] where XML parsers are used to parse HTML but there is no way to cater to such absurd use cases anyway. Using an XML parser to parse HTML5 is not going to work correctly anyway even if you do retain the optional tags because it would fail on other HTML5 tags that do not have closing tags such as <meta>, <link>, <img>, etc., empty attributes like <input disabled>, <input required>, etc. Web developers from all around the world are not going to start writing self-closing <img /> tags just because these broken ETL tools have decided to use an XML parser to parse HTML5.

There are plenty of good HTML5 parsers out there for almost every mainstream programming language. Just use them.

[1] https://news.ycombinator.com/item?id=25708209

still_grokking · on Jan 10, 2021

Most of the "plenty of good HTML5 parsers out there" are broken. No wonder as the spec is nuts. (It took years before there was even a correctly working validator).

Also I was explicitly talking about XML compatible HTML. It's called so because it's XML compatible.

Btw, have you ever seen HTML in the web browser dev tools? Guess why it shows always the "optional" tags. ;-)

susam · on Jan 10, 2021

> Most of the "plenty of good HTML5 parsers out there" are broken.

Can you name a few popular and widely used HTML5 parsers that are broken and tell us what the bugs are in those parsers? I would be surprised if you can find or name even two such parsers that are popular but cannot handle optional tags correctly as required by the spec.

> Also I was explicitly talking about XML compatible HTML.

There is no such thing as XML compatible HTML (unless you mean XHTML which we are not discussing here). Maybe you mean XML-serialized HTML5. I can only guess since the terminology you are using is vague and unclear. In any case, HTML5 by itself is incompatible with XML. I mentioned this in my previous comment. Not all tags in HTML5 are self-closing, thus incompatible with XML. XML-serialized HTML5 is however compatible with XML, by definition, and in that case, one would use an XML parser, not an HTML5 parser. More importantly, you can safely omit the optional tags and still convert your HTML5 document into XML-serialized HTML5 document without any issues whatsoever. This was explained to you by anjbe here at https://news.ycombinator.com/item?id=25706163. He is absolutely right.

> Btw, have you ever seen HTML in the web browser dev tools? Guess why it shows always the "optional" tags. ;-)

You see all the tags there because it shows the entire DOM. The browser automatically creates the elements when optional tags are not explicitly present in the HTML. This is all spelled out in the spec very clearly. Any HTML5 parser worth its name follows the spec. I am not sure what your point is here.

See https://html.spec.whatwg.org/multipage/syntax.html#optional-... for details, especially:

"Omitting an element's start tag in the situations described below does not mean the element is not present; it is implied, but it is still there. For example, an HTML document always has a root html element, even if the string <html> doesn't appear anywhere in the markup."

I hope that explains why you always see the elements for the optional tags in a web browser's developer tools.