WhatWG: Proposal – Update XPath to at least v2.0

lucideer · on Oct 13, 2020

Most impactful part of this imo is:

> Chrome is not interested in this. The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them

> -- https://github.com/whatwg/dom/issues/903#issuecomment-707748...

Given Chrome's status as the new IE6 in terms of market share and outsized influence over the technological direction of the web, there's a real risk of moves like this being unilateral.

On the other hand, the last two comments re: libxml being their primary concern do give some hope.

pete_b · on Oct 13, 2020

Agreed. I find it rather alarming that an influential member of the Chrome team is talking like this. All our e2e test suites are built around Chrome and XPath because of its expanded abilities over CSS

sergeykish · on Oct 13, 2020

Could you please share your use case? Lets drive adoption.

Spivak · on Oct 13, 2020

I really don't understand this opinion. Like, I mean, I get the frustration from the perspective of a web developer that wants to use a particular feature but are different browsers not allowed to be different?

Like if Firefox didn't want to implement WebUSB, Safari didn't want to implement WebPush, or Lynx didn't want to implement Canvas is that outrageous?

lucideer · on Oct 13, 2020

> are different browsers not allowed to be different?

The web (and open standardisation in general) has pioneered an ecosystem where the primary differentiation between browser is in user-facing UX & features (and ancillary factors such as performance, etc.), rather than developer-facing web-tech support.

This is quite different to a lot of other commercial "competitive" spaces as it substitutes vendor lock-in on patents & trade-secrets for actual innovation in the user-facing space. It's not all rosy: competing browsers still stray from this on the regular, but the ideal is one of the primary selling points of the web as a platform.

Browsers differentiating themselves on user features while maintaining cross-competitor consistency on web standards is the dream that differentiates the web, so seeing its erosion is something to call out.

> Like if Firefox didn't want to implement WebUSB, Safari didn't want to implement WebPush, or Lynx didn't want to implement Canvas is that outrageous?

What's particularly different here is that this isn't about the addition of a feature. The ticket opened is about adding XPath2 support but the quoted line is about removing existing XML support.

This may sound a bit like I'm supporting Microsoft's old "don't break the web" adage, but the big difference here is MS was reluctant to remove features competitors didn't have for fear of breaking IE-only websites (that had relied on them due to IE's dominance). This is about Chrome removing standardised features that browsers, servers, and applications of all varieties have supported interoperably for decades.

sergeykish · on Oct 13, 2020

I believe it is about rewriting implementation, not removing API [0].

> Deprecate, and consider removing, XSLT

> The consensus last time we considered this was that xml and xslt are too important for enterprise and we cannot remove them from the platform. Closing this bug to match that reality. We'll open a new bug if we ever decide to do this. [1] (Feb 22, 2019)

[0] https://news.ycombinator.com/item?id=24767500

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=514995

lucideer · on Oct 13, 2020

You may be right, and later comments in the issue thread seem to row back on the original comment and hint at that. Here's hoping...

gnagatomo · on Oct 13, 2020

If I remember correctly, Mozilla didn't want to support video DRM but ended up adding it to Firefox[0] in fear of losing marketshare because Netflix required DRM video playback[1].

Today's browsers are just trying to keep up with whatever Chrome decides to adopt.

[0]: https://blog.mozilla.org/blog/2015/05/12/update-on-digital-r... [1]: https://www.engadget.com/2014-05-14-mozilla-bends-on-drm.htm...

barumi · on Oct 13, 2020

> (...) but are different browsers not allowed to be different?

Allowed? Sure, why not?

Desirable? Hell, no.

Think about it. If you are developing a web application and you need ensure it runs on all supported platforms then you either:

a) use standard APIs that are provided by all platforms,

b) use platform-specific APIs and watch the number of platform-specific tests to grow exponentially along with development and maintenance effort,

c) drop platforms.

Suffice to say, option a) is far more desirable.

toyg · on Oct 13, 2020

Chrome is not "a different browser", it's the dominant browser. Google worked hard to achieve this state of things, and they now have a clear responsibility in terms of steering web standards. With great market share comes great responsibility.

dmitriid · on Oct 13, 2020

> it does seem that about 1-2% of page views end up using XPath

And Chrome is not interested.

And yet, when they release a standard all other browsers object to, they justify that... because it's used by 0.8% page loads (exclusively on Google properties, implemented exclusively by Google devs) [1]

And yet, when other browsers consider standards harmful, [2] Chrome just ships them [3]

[1] https://twitter.com/justboriss/status/1220428902071447552

[2] https://mozilla.github.io/standards-positions/

[3] https://www.chromestatus.com/features

spankalee · on Oct 13, 2020

You continue to grind your axe, but they only thing I was saying there was that backwards should be considered when potentially changing an API because the feature is used in the wild.

But sure, try to weaponize anything related to web components at every opportunity.

dmitriid · on Oct 13, 2020

> they only thing I was saying there was that backwards should be considered when potentially changing an API

The only thing that was said there: "we released this API into the world against multiple objections by multiple parties. This feature is used exclusively by our own developers and almost exclusively on our own properties. It's used on a whopping 0.8% pageviews (once again, almost exclusively on our properties). So now we will not remove it".

> But sure, try to weaponize anything related to web components at every opportunity.

No. That was a very recent example, and that example had a number of pageviews in it, so it serves as an interesting comparison of approaches.

0.8% pageviews by Google's own devs for a standard Google rammed through despite objections? Oh, it's good, ain't gonna remove it. 1-2% pageviews by people outside Google for a standard Google had no say in? Oh, not gonna do anything about it.

And if you paid a grain of attention, I also have this to say about Google's hypocrisy (with links to Mozilla's stance on standards and to Chromes feature list page):

--- quote ---

when other browsers consider standards harmful, Chrome just ships them

--- end quote ---

Which is a fact of life regardless of my feelings towards particular standards.

I can add another link, of course: Web API counts across browsers [1] It's so nice to see Chrome shipping in total over 1000 more APIs than competition, of them many considered harmful, and including over 600 browser-specific APIs. Because it's all for the greater good.

[1] https://web-confluence.appspot.com/#!/confluence

sergeykish · on Oct 13, 2020

Thank you for web-confluence, I've got similar question this May, researched with BrowserStack too [1].

I like XML, XHTML, XPath, XSLT. But I don't understand your argument. Maybe it would look better with a list of harmful Chrome only APIs standardized by WHATWG. Even then it is not a reason to follow bad example.

[1] http://dom-report.herokuapp.com/

dmitriid · on Oct 13, 2020

The web is not just WHATWG. Chrome rams multiple standards through several standards bodies. Scroll down in Mozilla's standards positions [1]. Then search for the harmful standards in Chrome features: [2]

You'll see most of them already released publicly. And people like @spankalee will call you a hater when you start calling Google out. And then clueless web devs will complain how "Safari is holding the web back" even though Safari and Mozilla can barely hold back the floodgates of poorly specified standards that are being pushed through at neck-breaking speed.

[1] https://mozilla.github.io/standards-positions/

[2] https://chromestatus.com/features

sergeykish · on Oct 13, 2020

The original issue is WHATWG, conversation should stick to WHATWG first:

https://wicg.github.io/webusb/

https://webbluetoothcg.github.io/web-bluetooth/

https://www.w3.org/2012/nfc/web-api/

https://www.w3.org/TR/html-imports/ # that one scrapped

That is not WHATWG.

WebReflection shows a great example of civilized discourse [1]. If excuse not to extend feature is low usage we'd better provide competing arguments how it is useful. If problem is usability we should address it first [2].

[1] https://github.com/whatwg/dom/issues/903#issuecomment-707828...

[2] https://news.ycombinator.com/item?id=24766845

spankalee · on Oct 13, 2020

I'm not in a position to remove or not remove an API - I just use it, and the only thing I'm saying there is to _consider_ backwards compatibility.

If the feature is changed, change the name so existing sites don't break. You're extrapolating far too much from that, as did Rich Harris and other persistent web component haters.

dmitriid · on Oct 13, 2020

> I'm not in a position to remove or not remove an API - I just use it, and the only thing I'm saying there is to _consider_ backwards compatibility.

There is no backwards compatibility for a feature that was released just a few days ago and that had been behind a flag prior to the release.

> If the feature is changed, change the name so existing sites don't break. You're extrapolating far too much from that

There's nothing to extrapolate. These are facts.

----

Also, note: "Rich Harris and other persistent web component haters". This is the continuous incessant vitriol that spews out of Google. Coupled with Chrome playing increasingly dirty, no wonder people feel resentment towards Google and its representatives.

bryanrasmussen · on Oct 13, 2020

I've decided to put this here rather than the WhatWG proposal as focusing on the Chrome statement overly much starts to see a derail.

When a Chrome representative says " The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them" I am not particularly surprised, if I were them I think I would like to get rid of these technologies too, based just on my feeling for how much they are used any more (assuming Chrome has stats and these stats bear our my feeling of low usage).

This of course also makes me sad in that I have quite a bit of experience with these technologies and their remove establishes their irrelevance in the present day ( or argues strongly for their increasing irrelevance, too strong a word choice might invite complaint). Believe me I would love if XPath was improved because maybe I might see ads for developers with advanced XPath again and I could increase my rates.

However I don't think it has ever been stated before that this is what Chrome would like. As such I think it falls under the rubric "Things everybody knows but nobody says", which generally nobody says these things because nobody wants to go through the onerous work of dealing with the implications. But as it has been said I start to wonder what those implications are/would be.

SVG has already been mentioned in the linked thread, I don't know if that is actually a problem because I don't know if SVG has been implemented using libxml in Chrome, I could totally see a point not to implement SVG with that - but then again I could see that if you have libxml and you need to implement the SVG DOM, maybe you use libxml to do it - so does getting rid of libxml impact SVG in Chrome?

Obviously applications working with RSS and other associated feed formats in the browser would probably stop to work. Of course people could write applications for these formats on the server, but it certainly seems a setback for RSS.

The same thing applies for RDF, and linked data applications running client side. Not many I know but hey, a nail in the coffin as it were.

MathML - which has never been implemented by Chrome has client side implementations, for example https://pshihn.github.io/math-ml/ I wonder if these would continue working, I would guess probably not.

What about XHTML, is anything there part of the Chrome XML stack - for example DTDs?

I can think of a few other things, but this seems like a reasonable start to think of what the implications would be.

brian_herman__ · on Oct 13, 2020

I'd like to eventually remove and deprecate google chrome. But unfortunately it is the 900lb gorilla in the room.

bawolff · on Oct 13, 2020

Everyone is mad at the chrome person, but honestly, all they are saying is they don't want the feature bloat of extended support for a super complex standard, that isn't very popular despite existing for 21 years and involves a library they want to deprecate.

Seems like a very reasonable no to me. You don't get good software by saying yes to every feature idea.

shawnz · on Oct 13, 2020

If the "no" argument here was what you are saying, that they don't want the feature bloat of XPath, then that would be reasonable. But they are actually arguing that they don't want the feature bloat of XML capability, and XPath doesn't need to have any dependency on XML capability.

So they haven't really addressed the request itself, and they are being extremely dismissive about any suggestion that they might be interpreting things in an unfair light. I think that is why people are frustrated.

I don't even want this feature and I am frustrated just by reading the linked thread.

masklinn · on Oct 13, 2020

> XPath doesn't need to have any dependency on XML capability.

That's an important point, programmatically applying XPath to HTML can be super convenient, while basic CSS selectors are superior, XPath is way better for non-trivial selections, and because CSS was designed in a rather ad-hoc manner it "scales" very badly as new features get grafted on.

lucideer · on Oct 13, 2020

> a super complex standard

This... is actually subjective.

XML is a relatively simple standard; the complexity is emergent rather than inherent to its definition.

Take for example an oft-cited security issue with xml: xxe. This results from xml entity referencing supporting filesystem access. But there's nothing inherently "complex" about that from a language/syntax definition perspective, filesystem access is just an inherent danger regardless of complexity.

That's not to say XML is as simple as it could be (everything has its caveats and edge-cases: null-default attribute namespaces is a weird one that comes to mind), but in general "strict" and limited language syntaxes tend to be much less complex than lax syntaxes: e.g. HTML or YAML, which have endless depths of gotchas with ambiguous or unintuitive parsing behaviours.

> that isn't very popular

Ha!

sergeykish · on Oct 13, 2020

> The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or ...

It sounds scary. I hope he meant core changes, not API

> ... or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal.

> By "XML parts of our pipeline" I mean "everything implemented using libxml and libxslt".

I have one example. Have you known HTML parser is faster than XML [1]? Yes, awfully bloated HTML parser [2] is faster.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1481080

[2] https://html.spec.whatwg.org/multipage/parsing.html

dmitriid · on Oct 13, 2020

> Everyone is mad at the chrome person, but honestly, all they are saying

All they are saying is that Chrome team is hypocritical to the extreme. See https://news.ycombinator.com/item?id=24766554

rhacker · on Oct 13, 2020

Wasn't the whole point of WHATG to not focus on the browser / specific implementations, but on the standards? I mean it was a ploy to dethrone IE. Just because IE has been dethroned, doesn't mean the purpose of the group should stagnate. I think at this point they feel justified because they defunded Mozilla. The FEDS absolutely need to break up Google.

oefrha · on Oct 13, 2020

W3C focused on design by committee, de jure standards. WHATWG focuses on standardizing de facto standards. W3C’s HTML effort failed and now its HTML5 spec redirects to WHATWG’s HTML5 spec. Maybe I misunderstood your comment, but sounds like you got it backwards.

bawolff · on Oct 13, 2020

On the contrary, i think the point of whatwg was to focus on reality and make descriptive standards instead of making prescriptive standards like the w3c that nobody implemented.

Pushing xpath (or anything else) despite vendors not wanting it is a step in the opposite direction.

barumi · on Oct 13, 2020

What's the point of a standard if no one adopts it?

morelisp · on Oct 13, 2020

Conversely, what is the point of a standard that just documents what 2 out of 3 multi-billion companies want to do anyway?

chkuendig · on Oct 13, 2020

It ensures that they do it the same way

morelisp · on Oct 13, 2020

Unless they disagree. Then they “standardize” both ways and the rest of us just have to cope.

bawolff · on Oct 13, 2020

What do you think standards are?

Documenting the commonalities between implementations in order to gain interopability, is precisely the point of a standard.

barumi · on Oct 13, 2020

> Conversely, what is the point of a standard that just documents what 2 out of 3 multi-billion companies want to do anyway?

Because it provides a fixed target that can be targeted with confidence by both developers and also the makers of any alternative browser (of which about 4 are still listed in global browser market share statistics).

Those who complain that right now the browser market is dominated by a couple of players seem to be entirely oblivious to history.

shock · on Oct 13, 2020

> I can tell this is not going to be a productive conversation, as folks are intent on playing word games to try and pretend Chrome has a different stance than we do. As such, I won't be participating in this thread further. I think I've made our position clear. --user domenic (from Google/Chrome, I presume)

So, a productive conversation is one in which people agree with the position of the Chrome team :-/

oefrha · on Oct 13, 2020

No, “we don’t support X because Y”, “Y can be interpreted to mean Z and Z does not conflict with X so surely you actually support X” is not a productive conversation.

shawnz · on Oct 13, 2020

That's not an accurate summary of the argument. XPath doesn't need to have any dependencies on the technologies they are trying to deprecate, like the Chrome team member is implying. So when they said "Y" they really did mean "Z", and the difference is relevant to the point.

oefrha · on Oct 13, 2020

Even if they did mean Z, "we want to drop this entire thing" to "we want to implement this whole thing and then some from scratch" is quite a leap. Looping back to gp's comment, there's a subtle difference between disagreeing and putting words into someone else's mouth.

scottfr · on Oct 13, 2020

While we are on the topic of XPath improvements, I would love to see a built-in XPath syntax to pierce the shadow DOM of Web components.

It's an important need for the automation and testing use cases. Without it, targeting an element within a web component simply cannot be done solely with a single selector.

spankalee · on Oct 13, 2020

Several vendors have objected to that when it's been brought up because it breaks encapsulation.

You can write such a utility in JavaScript in just a few lines though.

sam_lowry_ · on Oct 13, 2020

XPath 2.0 is a complex language able to crash you computer or mine bitcoin.

Keep XPath 1.0.

formerly_proven · on Oct 13, 2020

I'm probably missing something, but XPath 2.0 doesn't strike me as trivially Turing complete. Loops are bounded (either over range expressions or a set of nodes) and it can't define functions, so it doesn't have recursion, so evaluating any XPath 2.0 expression always halts, so XPath 2.0 can't be Turing complete.

chalst · on Oct 13, 2020

You don't need Turing completeness to mine.

phkahler · on Oct 13, 2020

>> Keep XPath 1.0.

But let's backport a few features from v2 and call it v1.1 it'll be just like all the OpenGL versions.

masklinn · on Oct 13, 2020

The non-garbage parts of XPath 2.0 are in the additional expanded function library, which largely come from exslt.

You don't really need to change the version because the language itself doesn't change.

bryanrasmussen · on Oct 13, 2020

I do think having a query language with its own for and if semantics in it would be adding unnecessary complexity to the browser tech stack - but hey - imagine the big bucks I could be pulling in as a consultant if recruiters started having to get guys with 10+ years experience with XPath 2.0 and JavaScript!

Won't someone please think of my financial needs!

admax88q · on Oct 13, 2020

It would have made sense to use XPath for CSS selectors, or at least make the CSS selectors a syntax compatible subset if you wanted to them piecewise in functionality like they currently have done.

bryanrasmussen · on Oct 13, 2020

Well, I believe that Håkon Wium Lie would never have let that happen, but hey could be misreading his stance on things.

masklinn · on Oct 13, 2020

The main reason why it wouldn't have happened is CSS selectors predate XPath by a few years. CSS was first proposed in 1994 and the CSS1 spec was released in 1996, I don't know when XPath was originally proposed but the first public draft was in late 1998 and the release was in 1999.

CSS 2 actually predates XPath 1.0.

XPath would also have needed more work to replace CSS selectors, aside from being a bigger performance concern (through being more capable and not working in a strictly top-down manner, meaning you can easily get very inefficient selectors) it lacks facilities which are quite critical to CSS selectors like the shortcut id and class selectors as well as priority.

In fact talking about class selectors, those are absolute hell to replicate in XPath 1.0 if you don't have extensions to lean on. To replicate the humble `.foo` you need something along the lines of

    //*[contains(concat(' ', normalize-space(@class), ' '), ' foo ')]

And don't miss the spaces around the name of the class you're looking for, they're quite critical. Good fucking luck if you need to combine multiple classes.

exslt/xpath 2.0 have `tokenize` which make it much more convenient although IIRC the way it's used is weird, I think it's

    //*[tokenize(@class) = 'foo']

because "=" on a nodeset is really a containment operation? Not sure. There's also `matches` but that's error-prone because classes tend to be caterpillar-separated, and your friendly neighborhood `\b` will match those so you need to mess around with `(^|\s+)` bullshit instead.

And finally I believe xpath 3.1 has a straightforward "contains-token" which does what the CSS "~=" operator does.

XPath 3.1 was released in 2017. "~=" was part of CSS2 (CSS1 didn't have "arbitrary" attribute selection, only classes and ids).

sergeykish · on Oct 13, 2020

XPath works with XML, class is a mini language. We had same problem in JavaScript until classList

    $(element).attr("class").split(/\s+/)

It is side effect of one attribute name per element

    <a class=foo class=bar>
    //*[@class = 'foo']
    //*[@class = 'bar']

or having attributes at all

    <a>
      <class>foo</class>
      <class>bar</class>
    </a>
    //*[class = 'foo']
    //*[class = 'bar']

Of course `class` access would be optimized like it is today.

masklinn · on Oct 13, 2020

> XPath works with XML, class is mini language.

So? Also class is just a token list.

> We had same problem in JavaScript until classList

1. it's much less common to select using javascript than it would be using an xpath selector, hence the issue.

2. because JS is a full-blown programming language, splitting the class into a list is not difficult, and can furthermore be trivially factored into a helper function, or a set thereof (as class modification would really be what you'd want to do)

> It is side effect of one attribute name per element

It's a side-effect of token lists not being much of a use-case at the time for XPath, despite having been a CSS use-case for 3 years at that point.

As my comment notes, XPath 3.1 literally has a contains-token function, that function is compatible with XPath 1.0. If you have an XML processor which allows extension (which would be the case for more or less all of them outside browsers), you can trivially implement your own, or an even more specialised `has-class` function.

But contains-token was added in 2017, not in 1999, to say nothing of 1996.

sergeykish · on Oct 13, 2020

It looks strange to me, language designed to work with trees so turn everything to tree. No need to change serialization, we know format, just parse it — class, URL, CSS. That's what we do in JavaScript [1], [2], [3].

    location.host === 'example.com'
    span.style.color === 'blue'
    datet.getFullYear() === 2020

    //a[href/host = 'example.com']
    //span[xstyle/color = 'blue']
    //date[datetime/year = '2020']

Just like your proposal but node knows how to parse itself

    //a[url-host(@href) = 'example.com']
    //span[css-color(@style) = 'blue']
    //time[datetime-year(@datetime) = '2020']

And it is actual syntax, I've checked on

    <a><href><host>example.com</host></href></a>
    <span><xstyle><color>blue</color></style></span>
    <date><datetime><year>2020</year></datetime></date>

* <style> is CDATA, emulated with <xstyle>

[1] https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenLis...

[2] https://developer.mozilla.org/en-US/docs/Web/API/URL

[3] https://developer.mozilla.org/en-US/docs/Web/API/CSS_Object_...

admax88q · on Oct 13, 2020

Yeah maybe not the best idea, XPath has a pretty different use case than CSS. But it does seem a shame that more complex "nth-child" selectors do not just use something more flexible/programmable like xpath.

As it stands CSS keeps adding one-off selectors such as nth-child, only-child, only-of-type. Such is the feature creep of web standards as they are, a continual addition of one-off APIs for specific use cases rather than a robust orthogonal programming platform that people can build anything on top of, like WebCrypto instead of adding integers.

bryanrasmussen · on Oct 13, 2020

well, I know XPath 1.0 was released in 1999 because I was doing stuff with it.

CSS 2 as I remember was some years after that, and googling it seems to be August 2002.

At any rate in this example

//*[contains(concat(' ', normalize-space(@class), ' '), ' foo ')]

I don't think the interest in using XPath is at all related to being able to check what class an element has, and in fact the linked discussion does not have anything to do with that, so I mean yeah, that's pretty bad but not anything anyone is asking to do in this scenario.

masklinn · on Oct 13, 2020

> CSS 2 as I remember was some years after that, and googling it seems to be August 2002.

You're thinking about CSS 2 revision 1 aka CSS 2.1

> I don't think the interest in using XPath is at all related to being able to check what class an element has

The original comment in the thread was

> It would have made sense to use XPath for CSS selectors

Being able to check what class an element has would be absolutely critical to that use-case.

bryanrasmussen · on Oct 13, 2020

ah yeah, forgot we were in thread and was thinking about the post itself.

bawolff · on Oct 13, 2020

Well im glad they didn't

CSS is a beautifly intuitive query language. XPath is an ugly non-intuitive syntax. I suspect that's part of the reason that it didn't take off (ux of apis matter)

vbezhenar · on Oct 13, 2020

At this point I think that those APIs should be implemented with high-performance JS or Wasm and browsers should provide just enough API entry points to allow for efficient implementation.

chrismorgan · on Oct 13, 2020

You could already do this by trawling the DOM and constructing whatever data structures you desire, and updating them by watching for modifications to the DOM with a MutationObserver. But performance would be somewhere between poor and execrable, and memory usage would skyrocket. Note also that this is doing batch updating, so you’ll be querying a potentially out-of-date DOM this way. I state confidently that there will never be a good way of doing this, because it fundamentally requires putting something that is unavoidably slow and memory-heavy onto the critical path (especially if you want it to operate synchronously rather than in asynchronous batch mode). No efficient implementation will ever be possible, given the design of the web.

coding123 · on Oct 13, 2020

The chrome guy is being a punk.

xyzzy_plugh · on Oct 13, 2020

> domenic commented 2 hours ago

> Chrome is not interested in this. The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal.

Huh? Is he saying they want to remove XML support? Doesn't this include... the DOM?

This proposal seems reasonable but I don't understand this response by someone representing Chrome. Seems like this guy woke up on the wrong side of the bed this morning.

masklinn · on Oct 13, 2020

> Huh? Is he saying they want to remove XML support? Doesn't this include... the DOM?

The XML DOM is largely separate from the HTML DOM, and so are the corresponding parsers.

bawolff · on Oct 13, 2020

The html dom isn't really xml.

timgilbert · on Oct 13, 2020

That's true, although as someone commented in the github thread, SVG diagrams are very much XML, with a namespace and everything.

recursive · on Oct 13, 2020

No. HTML is not XML, other than SVG and co.

jarym · on Oct 13, 2020

> The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal.

Bit of a weird comment from Chrome. 'XML' doesn't generate 'security bugs' - sloppy development practices do. If they want to generate less security bugs they should be more aggressive with their testing/fuzzing in the parts of code that deal with XML.

masklinn · on Oct 13, 2020

> Bit of a weird comment from Chrome. 'XML' doesn't generate 'security bugs' - sloppy development practices do.

That's… only true in the sense that C doesn't generate security bugs either. There are environments, systems and standards which are much more bug-prone than others, for various reasons.

And XML is one of them. Not only does it literally specify and embed security issues (so you have to break the standard to fix some of that stuff), it is a pretty complex standard, and because it is a somewhat inefficient format implementors are more likely to need to optimise the pipeline, which is difficult, and leads to more security risks.

By all measures, XML is very error prone and its history is littered in security bugs in way few formats even compete. It's not the worst (I've heard a lot of bad things about ASN.1, which is made worse by ASN.1 generally having to be parsed in security contexts) but as a shortcut "XML generates security bugs" is completely fair.

rleigh · on Oct 13, 2020

There is also the matter of implementation quality and robustness, and the future of XML as a supported technology.

XML technology is over 20 years old, and mainstream support for it has been slowly declining. If you're writing in any language other than Java, you don't have any good choices if you want to use more than the basic SAX/DOM APIs, and all of the available libraries have significant drawbacks if you want to do anything more than the bare essentials. I can fully understand why projects want to drop their existing XML dependencies where possible, and shy away from picking up new ones.

If you're writing a product using C or C++, then you've got few choices: Apache Xerces-C and Xalan-C for XML and XSLT/XPATH, respectively. Or libxml/libxslt. Neither of these is a good choice (for the record, I'm involved with both Apache projects on an ad-hoc basis). Or QtXml/QtXmlPatterns. There are others but they aren't worth writing about. The Apache projects had pretty comprehensive support for the earlier XML and XSLT/XPATH standards, and matched the Java implementations pretty closely. But with the waning interest in XML, and the huge amounts of technical debt involved, both projects have been in maintenance mode for the last decade. No one wants to break compatibility with significant rework.

Then there's the GNOME-derived libxml/libxslt. A little more modern but C-only, and lacking most of the features provided by the Apache implementation. OK if a simple subset is sufficient, but otherwise not viable.

The Qt libraries are good, but it's a very heavy dependency and is again a subset of the Apache functionality, and is XSLT2 only.

When it comes to adding in XPATH 2.0, there just isn't the maintainer interest available, primarily because there's no corporate interest in taking any of these projects forward in any meaningful way. XML is no longer cool or interesting, and as such projects are not willing to spend time and money on maintaining these codebases. They are all, to varying degrees, in maintenance mode.

I previously worked on a large project that for various reasons, including jumping on the bandwagon of the moment, heavily opted into using XML, XSLT and using as many obscure features as possible. Being Java based, it had access to jars for doing pretty much any XML processing it liked. If the core functionality provided by the JDK isn't enough, you've got Saxon and other implementations readily to hand. I had to port a subset of the functionality to C++, and the only possibilities were Xerces+Xalan. libxml/libxslt simply didn't have the featureset for all of the obscure XML processing features needed, or all the EXSLT functions. Needing to depend upon a pair of obsolete, essentially unmaintained and difficult to use libraries is why I ended up involved in their maintenance. I later added Qt support as an alternative, but at the expense of having to disable a huge amount of core functionality. I'd like nothing more than to dump the lot of them.

It's also worth pointing out that the root of the problem here is the absurd complexity of all of the (many) XML standards and APIs. A fairly simple concept which has so much complex stuff layered upon and grafted onto it that it's not really much of a wonder that it's in such of a poor state. It should have been kept simple.

barumi · on Oct 13, 2020

> Bit of a weird comment from Chrome. 'XML' doesn't generate 'security bugs'

It seems you failed to either read or understand the comment, and meanwhile were also too eager to post snarky comments.

If you pay attention to the argument that was actually made, the commenter mentioned specifically "everything implemented using libxml and libxslt".

Meanwhile, here's a list of libxml security issues:

* https://www.cvedetails.com/vulnerability-list/vendor_id-1962...

Here's a list of security issues affecting libxslt:

* https://www.cvedetails.com/vulnerability-list/vendor_id-1962...

OP's point is very clear: if they could remove support for this feature, any of this vulnerabilities, past, present and future, would vanish.

sergeykish · on Oct 13, 2020

Isn't it backwards? WHATWG — there should be two implementations first.

Edit: we would not like if Chrome only feature was standardized. Lets have dialog, not silent downvotes.

XPath is great, for example

    $x('//text()[last()]')

finds last Text node. It is possible to query by text content

    $x("//a[text() = 'parent']")

finds parent links on the page. Other the years CSS adopted many selector, recent addition - :has(direct children) not implemented yet

    $("//a[img]')

works. I would like to hear other examples.

What XPath needs is modern interface like CSS has. It is easy to shim [1]

    document.queryXPathAll

That would be much better proposal

[1] https://developer.mozilla.org/en-US/docs/Web/CSS/:has

[2] http://sergeykish.com/web-api-element-prototype-queryxpathal...