Not allowing the CFAA to be (ab)used to attempt to make scraping illegal makes sense.
However, how is it reasonable to force a web site to serve its contents to a third-party company, without being allowed to make a decision whether to serve it or not? Serving the web site costs money, and the scraper surely isn't going to generate ad income...
Ugh, yeah, the more I think about this ruling, the less I like it.
It's actually pretty insane to force a site to serve content. I think both parties are in the wrong here - HiQ for assuming they're entitled to receive a response from LinkedIn's webservers, and LinkedIn for abusing the CFAA to try to deny service rather than figure out a technical solution to their business problem.
In my view:
* The data is public, and free of copyright. If you're a scraper and can get it, you haven't done anything wrong.
* The servers serving the data are still under LinkedIn's control, and they have no obligation or public duty to always serve that content. They could just as well block you based on your IP or other characteristics. If they want to discriminate and try to only let Google's scrapers access the data - what's wrong with that? Scraper brand is not a protected class. Tough taters if your business model "depends" on your ability to successfully make requests to another uninvolved company's webservers.
If I were the judge, I'd throw this out and let LinkedIn/HiQ duke it out themselves - they deserve each other.
I would argue that under spirit of net neutrality you either serve your site to everyone equally(the public facing part) or to no one.
Hosting costs money, servers cost money.. but maybe create a public facing API that is way cheaper and easier to use than scraping your website? I see that ruling in positive light that it might promote more open and structured access to the public facing data.
That was the case, hence the reference to the "spirit" of net neutrality.
Public facing internet sites, in my opinion, should be treated in same way as public space - anyone should be free to read, and write down in their notepad whatever is there, in the same way as anyone else.
Scraping public facing website in my opinion is huge waste of resources. It would be cheaper(in total) to build an API that can serve the data from it, than to build a good scraper.
Net neutrality is more about nondiscrimination in routing content from a provider to a user, rather than forcing content providers to serve everyone regardless of conduct. It's entirely reasonable for a site to discriminate who they wish to allow to access their data (whether technically their copyright or data they caretake).
That being said, if you provide data to the public, you don't get to invoke the CFAA to plug the holes your content discrimination code doesn't fill.
Anyone is free to put up a paywall and deny access to people who don't pay.
But LinkedIn is apparently happy to let Googlebot and bingbot scrape public profiles. If they want to do that, they can't argue that their policy is to block bots who don't click on ads. Discriminating Googlebot from other visitors is probably a violation of Google policies, too. They can't have their cake and eat it at the same time.
From reading the opinion, I think the argument goes something like this:
> First, LinkedIn does not contest hiQ’s evidence that contracts exist between hiQ and some customers, including eBay, Capital One, and GoDaddy
> Second, hiQ will likely be able to establish that LinkedIn knew of hiQ’s scraping activity and products for some time. LinkedIn began sending representatives to hiQ’s Elevate conferences in October 2015
> Third, LinkedIn’s threats to invoke the CFAA and implementation of technical measures selectively to ban hiQ bots could well constitute “intentional acts designed to induce a breach or disruption” of hiQ’s contractual relationships with third parties.
> Fourth, the contractual relationships between hiQ and third parties have been disrupted and “now hang[] in the balance.” Without access to LinkedIn data, hiQ will likely be unable to deliver its services to its existing customers as promised.
> Last, hiQ is harmed by the disruption to its existing contracts and interference with its pending contracts. Without the revenue from sale of its products, hiQ will likely go out of business.
> LinkedIn does not specifically challenge hiQ’s ability to make out any of these elements of a tortious interference claim. Instead, LinkedIn maintains that it has a “legitimate business purpose” defense to any such claim. ... That contention is an affirmative justification defense for which LinkedIn bears the burden of proof.
So the real situation is that you can't go out and start blocking access you knew about in a way that would interfer with third party contracts without a legitimate business reason to do so. The burden of proving the legitimacy of that business reason is on you.
edit:
TLDR;
> "A party may not ... under the guise of competition ... induce the breach of a competitor’s contract in order to secure an economic advantage."
Be restaurant. Be on Deliveroo. Be getting low margins because of high fees.
So basically you can’t decide not to use Deliveroo any more, to improve margina (“secure an exonomic advantage”). I mean, you can cancel Deliveroo, but only as long as you’re not “inducing a breach of their contract”. So only a matter of time before Deliveroo writes a contract “we’re obligated to deliver food for you from said restaurant”.
Choosing not to use a middleman any more so that you can secure higher margins sounds like about clearest example of a "legitimate business reason" imaginable. The purpose of the act is to immediately increase your margins, not to hurt Deliveroo because you don't want their competition.
That's very different from the case in question, where LinkedIn's motive for cutting off hiQ's access is to inflict damage on hiQ because they are a potential competitor.
I would imagine that if you contract with Deliveroo, they have some terms that say that you need to give notice when cancelling?
I don't know Deliveroo, but I think a better analogy would be if you suddenly, even though it is not causing you trouble, denied access to someone picking up food that you didn't contract with, with the full knowledge that the someone would be in big trouble with their customers.
IANAL, but I think you're misunderstanding "without a legitimate business reason to do so"
"Be Restaurant" blocking Deliveroo because they can't continue operating with the loss of revenue due to high fees is a legitimate business reason. "Be Restaurant" blocking Deliveroo 2: Electric Boogaloo because I don't like their owner, but continuing to allow Deliveroo access would be, presumably, disallowed.
Also there's nothing stopping "Be Restaurant" from offering an exclusive delivery contract to Deliveroo and forcing Deliveroo 2 out, or requiring a minimum fee for all delivery services, Deliveroo and Deliveroo 2 included.
Of course, I think this is all in a very different area from a restaurant; we're talking about a service provided on the internet. I believe LinkedIn has many, many other recourses here, but, as I see it, the courts are just telling them, this aint it chief.
> What I mean is that freedom of speech is not the same as freedom of censoring.
This is at least not quite true of First Amendment law. The concept of "compelled speech" exists in US law, and is considered an unconstitutional violation of the First Amendment. Exactly what falls into that category (and whether the right of domain owners to censor user-provided content as they see fit is protected), I'm not sure, but freedom of speech in the US certainly does at least sometimes include the right not to speak.
Yes, the court was right to block LinkedIn's abuse of the CFAA. But the court was wrong that say that LinkedIn must show HIQ the same website as LinkedIn shows everyone else.
The data are certainly not free of copyright. Data can contain user picture, or even small essay describing the job, life of a user though linkedin is not the copyright holder.
Moreover these are personal data, and I'm not sure that the scraper has the original user right to collect the data. In Europe, the scrapper may face issues related to GDPR.
Facts can't be copyrighted, so such things as whether or not a person worked for a certain company, or went to a certain school, are unprotected, and with this ruling can be scraped, at least in the U.S. Others things common on LinkedIn, as you rightly point out, are protected--but by copyright law, not the CFAA. So a scraper acting in good faith would have to be careful about what they used if they wanted to respect copyright, but it's a separate issue from this ruling.
This is exactly right. Copyright protects creative expression, not pure fact. Famously, phone books (remember those?) are basically not copyrightable except for the ads, because they're just lists of data. Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340 (1991).
I never said that fact can be copyrighted, I said that most of the things people put around in their profile can be. I was responding to the claim that the data were not under copyright made above. If you just scrap name, company, position, this is fine, but I highly doubt that they just do that. This lawsuit can have tons of side effects.
I'm not sure what "database rights" refers to specifically, but the whole matter is actually rather complicated, because the EU copyright directive has a lot of optional exceptions that member states may or may not adopt.
Most of these exceptions only apply to non-commercial use though. So they wouldn't apply in a case like hiQ.
Unfortunately, both Labour and the Tories have taken a relatively hard line in the EU copyright negotiations, so it seems unlikely that things will be relaxed very much after Brexit.
"Facts can't be copyrighted, so such things as whether or not a person worked for a certain company, or went to a certain school, are unprotected"
There's an infinite number of ways to describe a job history, without any single standard, so I don't think it makes any sense to say that a profile or resume is not copyrightable.
Isn't the issue of being selective on who can view the content? If I, random Joe User views the publicly available content you have no issue. But if someone scrapes that data them you'd want to charge them. Unless I click on the ad, the act of using your bandwidth doesn't change based on who the viewer is. You'd want to apply fees based on the future use of the data rather than on your actual costs.
I'd assume if you weren't signing up, you'd probably look at like 10 profiles tops. A scraper is more than likely going to run through anything and everything it can grab links to (provided it doesn't leverage a very specific filtering mechanism for selecting profiles to scrape).
I could see the hit from a scraper being heavier than that of a typical user. There's also the potential that a user is going to click an ad for any number of reasons, there isn't that likelihood the scraper will.
I'm not anti-scraping by any means, but I get the concerns.
Surely the action is "if you display stuff in public you can't segment the public".
You're not obliged to have public access.
Is there perhaps a factor here of users having an expectation that their profile is publicly accessible; so companies hosting that profile shouldn't be able to choose _secretly_ "who" can access it?
You're inconsistent, and so are the courts and most comments here. Either you favour such conflicts to be decided by technological might, or by the clearly expressed will of the content publisher to have binding effect.
If you consider scrapers to have some sort of right to access any public website, any technological barriers inflict exactly the same harm as an injunction, assuming it is effective. IF you allow technical blocking, it would be preferable to allow blocking-by-clearly-stated-wish, because it would save everyone the costs of the arms race. It would also make both parties' success somewhat independent of the resources they can invest into outgunning their opponents.
> However, how is it reasonable to force a web site to serve its contents to a third-party company, without being allowed to make a decision whether to serve it or not?
Your statement makes absolutely no sense. That's not how internet works. If you serve something publicly you don't get to cherry pick who sees it.
Not only it makes no sense technically it's also a huge anti-competitive case.
It makes sense and it is how the internet works. Servers cherry pick who sees their content all the time. Scrapers are often blocked, as are entire IP address ranges. Things like Selenium server scrapers can be (approximately) detected and often are denied access.
I’m not sure about being anti-competitive. Serving a website is an action in which you open up your resources for others to access. My friend runs an open source stock market tracking website for free. He started getting hit with scrapers from big hedge funds and fintech companies a couple of months back. This costs him around $50-100 a month to serve all of these scrapers.
He and I both have similar free open source websites with donate buttons. They are rarely clicked. Ad revenue over a month for me has been ~$400 while donations over two years have totaled $20. There are about 80,000 unique visitors per month.
It is nice to think donation platforms can fund high traffic open source projects, but this is simply not the case.
In any regard, I fear the potential of this ruling limiting developers’ ability to protect their servers and making us all roll over to the big players with their hefty scrapers taking all of our data for resale.
how long are you allowed to delay results, I mean not serving results is just delaying them forever but that's out. Can I delay serving results longer than chromium's default timeout?
I don’t see what legal or technical argument you’re making.
Technically, of course you can identify IP ranges owned by certain entities and restrict their access. That’s trivial, so what do you mean when you say the internet doesn’t work like that?
Legally, there’s plenty of region locked content for copyright and censorship reasons. A distributor might region lock because they don’t have distribution rights in particular regions. Are you saying distributors can’t publish free content at all because they can’t choose who sees it but would be breaking copyright law to publish to everyone? Or a site might region lock because certain content is censored in particular countries. Can you not publish anti-regime articles because a totalitarian country is on the Internet?
The entire world isn’t and shouldn’t be held hostage to the most restrictive laws that exist in the world. And the answer isn’t blocking on the requesting end because that’s technically much harder and blocks much, much more content. So what am I missing?
Edit: Forgot to include the other end of the spectrum. If I, as an individual, host my own site on my own hardware with my own connection that I pay the bandwidth for, can I deny a suspected not network?
Of course you get to choose. You can reject requests based on their user agent, their IP address, the owner or likely geographic location of the IP address, and many other possibilities.
What are these possibilities? You only get IP and client side information that client is _willingly_ sending to you. So if a script/user/bot/etc tells it's Firefox from 1.2.3.4 then all you know that it's a request from 1.2.3.4 that says it's Firefox. You can ask it to run Javascript code but that's beyond classic web interaction and then again you need to trust the client.
This interaction is impossible to be trustless thus every client can only be served based on their IP or some convoluted, hack exchange that is cat-and-mouse game at best.
LinkedIn’s public facing content is exactly that: public. This ruling merely says accessing public content isn’t hacking and so LinkedIn cannot use the CFAA as discriminatory weapon to limit access to that public facing content.
If LinkedIn wants to block access they need to do so by another means that isn’t described as hacking.
I know it is a generally considered bad form to ask, but did you read much of the ruling? I feel like a lot of people on this thread are just going off of Animats' comment and haven't spent much time looking at the opinion.
I didn't read the whole thing, but skimmed through it and read what seemed to be the relevant parts of the argument. (Including the bit that talks about LinkedIn's robots.txt)
The ruling doesn't really support your claim of catastrophe and doesn't claim to pass any sort of final judgement.
The judge makes a specific point about not reading too much into him upholding the injunction saying:
>> These appeals generally provide “little guidance” because “of the limited scope of our review of the law” and “because the fully developed factual record may be materially different from that initially before the district court.”
This second part is pretty stupid, however, now that we are at this point, Linkedin still has the ability to decide which of its information is public and which is not. By making all of its information private, it can take back control.
it's not the scraper's fault that their business model incorrectly assumed profitability through ads in a way that did not foresee compliance with future ani-scraper-discrimination laws.
it's a good point you bring up, and may contribute to the death of ads.
However, how is it reasonable to force a web site to serve its contents to a third-party company, without being allowed to make a decision whether to serve it or not? Serving the web site costs money, and the scraper surely isn't going to generate ad income...