What does that even do? Sorry, I’m not familiar with Windows, but would love to ...

Arnavion · on Oct 6, 2015

Fetches http://Slashdot.org (`iwr ...`), parses it as an HTML page and extracts all the anchor tags (`.Links`), and selects the href of each anchor tag (`select href`). Since this is the end of the command, the hrefs are returned and printed as an array of strings (so one href on each line).

socksy · on Oct 6, 2015

Including the href="" bit?

Arnavion · on Oct 6, 2015

Just the values. The same things you'd get by accessing the href property in JavaScript.

socksy · on Oct 6, 2015

Well, with just grep and curl, it'd be something like:

    curl http://slashdot.org/ | grep -o "href=[\"'][^\"']*" | sed -e "s/href=\"//"

But presumably this is being discounted due to the lack of HTML parsing, so not the same as the Powershell example. Then one somewhat ugly method would be using the html xml utilities provided by the W3C and available on most package managers:

    curl http://slashdot.org/ | hxpipe 2> /dev/null | grep Ahref | sed -e "s/Ahref CDATA //"

Obviously not as pretty, but it does do a nice job of going through the HTML tree, and doesn't rely on nicely formed XML that /usr/bin/xpath would.

Arnavion · on Oct 6, 2015

Yes, a naive grep will pick up variables named href in JS, the text content of a div that contains "href", an href attribute on a non-anchor element, etc. so a utility that specifically parses HTML is necessary, but not sufficient.

I'm not sure how robust your hxpipe example is against those.

socksy · on Oct 7, 2015

It's a shell script. If you're architecturing more than that in your shell script, I would suggest that you shouldn't be doing this in a terminal in the first place. And if you're just trying to click links for a quick shell script, why not just do it via wget in the first place, rather than have this intermediate list of strings?

But since you asked, hxpipe assumes that href on a non anchor tag is an error and should be represented as an Ahref... which isn't too bad an assumption to make, tbf. The other situations is dealt with (text content, javascript).