Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real issue here is somewhere between both you and GP. What is required to trigger the CFAA?

Does accessing a page the site owner doesn't want you to violate the CFAA or do you need to hack through access controls?



As a real-world analogue: you can indeed be guilty of trespassing on someone's property even if you don't have to jump over any fences or pick any locks to get there. In some places, they don't even have to have a "no trespassing" sign. Simply being present on someone else's property without an invitation from them is illegal, and no, an open door does not count as an invitation.


> an open door does not count as an invitation.

But if you have a someone living there who lets anyone in if they ask, it would be pretty hard to argue they are trespassing.

A user agent must ask for every page with an http request. If the server responds with 200 OK, it’s pretty hard to argue that it isn’t letting (or even inviting) you in.


This doesn't cover you if you lied to get the invitation. If a robots.txt file denies some but not all user agents, setting your user agent to indicate that your request originates from a source it does not originate from is clearly a circumvention of an access control.


You generally can’t be charged with trespass unless you refuse to leave when told to do so.

An open door to a home is different, but unfenced property is 100% not trespass until you refuse to leave.


Trespass in criminal law usually requires notice that trespassing is prohibited, but this is usually satisfied by posting a "no trespassing" sign in a prominent area.

For example, my state's law (emphasis added):

> Whoever, without right enters or remains in or upon the dwelling house, buildings, boats or improved or enclosed land, wharf, or pier of another, or enters or remains in a school bus, as defined in section 1 of chapter 90, after having been forbidden so to do by the person who has lawful control of said premises, whether directly or by notice posted thereon, ...

The federal version of trespass (which I think applies to Indian reservations) considers merely fencing off the area to be sufficient notice that trespassing is prohibited.


That's not true at all. If you are aware that the property you are accessing is not meant for your use, you can be charged with trespassing regardless of if you have specifically been asked to leave or not.

It's even possible to be guilty of trespass even if you weren't aware that you weren't allowed on the land. This is negligent trespassing.


Negligence only applies I'm situations where a reasonable person should have known. You're only able to be charged with trespassing whilst being unaware if you were to so ridiculously unaware of your surroundings that any reasonable person in the same situation _would_ have known that they were trespassing.

If a public park blends into somebodies private lawn you can't be charged with tresspassing for stepping over the line.


this is bad comparison because when scraping a site, you don't cross any borders, you just send and receive information. You can compare this to a phone call or to talking to someone.


A website or server is property, just like land is. Accessing it is no different than accessing any other piece of property. Opening a website is, for all intents and purposes, the same as crossing a border.

To take it a step further, the information on said website is also personal property, and accessing the information without permission is also trespassing. Specifically, this is called trespass to chattels [1] (trespass is most usually legally defined as trespass to person, trespass to land, and trespass to chattels). Even more specifically, this is the actual part of the CFAA that's being debated: computer trespass gets its roots from trespass to chattels. [2]

1: https://en.wikipedia.org/wiki/Trespass_to_chattels

2: https://en.wikipedia.org/wiki/Trespass_to_chattels#In_the_el...


This analogy is faulty and congress really needs to clarify what they meant the CFAA to protect against.

Opening a website is making a request, technically speaking. That is not equivalent to breaking into someone's home and taking information. The equivalent would be if the head of the household told you not to stand outside and ask someone inside to give you something from the house. You haven't trespassed, you're asking someone in the house to do something for you. It's on them if they do what you request or not.


A website isn't a person and the law doesn't expect them to act as such. It's a tool. Making a 'request' to a web server is more like turning the knob on a door: maybe the owner installed a lock, or maybe it just opens without there being a lock. But even if there isn't a lock, the law doesn't absolve you of trespassing against the door's owner just because the door itself didn't have the sentience to refuse your request.


But it's a door handle that is MEANT to be turned by the public at large. It's like putting a big "Order Inside" sign above the door to a restaurant and being surprised when people try to gain entry.

You also never entered the server. The server got your request and served something back you to. You did not go inside the house and read the contents of a book on the shelf, it was read aloud to you while you are still outside the house.

I'm not saying that websites shouldn't have recourse against people taking all the contents of their sites, just that the CFAA is the wrong tool.


>But it's a door handle that is MEANT to be turned by the public at large.

No it isn't. That's the crux of the case.

>It's like putting a big "Order Inside" sign above the door to a restaurant and being surprised when people try to gain entry.

It's like putting a big "order inside" sign above the door to a restaurant, and then also having a separate door in the back of the restaurant that clearly is used only by employees to go to the back office, and not being happy when non-employees keep trying to walk into the back office claiming "well there's a sign outside...".

>You also never entered the server. The server got your request and served something back you to. You did not go inside the house and read the contents of a book on the shelf, it was read aloud to you while you are still outside the house.

According to the courts, you did 'go inside the house' because the electronic signals that you sent to the server as part of the request are enough to constitute the 'physical contact' part of trespassing.

Again, trespassing isn't just about you physically having your body on someone else's property. It also can be your interaction with someone else's property (which can be land, or a door, or web servers) through the use of tools or intermediaries.


> not being happy when non-employees keep trying to walk into the back office claiming "well there's a sign outside..."

If you don't bother to put up an "Employees Only" sign on the door, you are going to have a hard time getting a trespassing charge to stick...


> You also never entered the server. The server got your request and served something back you to. You did not go inside the house and read the contents of a book on the shelf, it was read aloud to you while you are still outside the house.

By this reasoning it's impossible ever to hack anything. Even breaking password controls or cryptography is still just sending the server a request and getting something served back.


Lol really?

I'm not "on" your site when I browse there. I asked your server to send me some data and it did so.

Its real life equivalent to social engineering. Its so far not illegal for me to ask you things and for you to disclose them to me even if you weren't supposed to. I'm allowed to lie to you even to persuade you to tell me things.


You didn't "ask my server". You used a tool to extract data from my server.

It's more akin to you standing just outside my property border and using a fishing pole to pull fish from a pond that is inside my property border. You're still trespassing even if your two feet aren't physically on my land.

The common legal argument (see the second link in my above comment) is that accessing a web server actually does constitute being "on" the server because you are sending signals to my server in order to interact with it, and this satisfies the "physical contact" part of trespassing.

From Wikipedia:

> The courts that imported this common law doctrine into the digital world reasoned that electrical signals traveling across networks and through proprietary servers may constitute the contact necessary to support a trespass claim.

>Its real life equivalent to social engineering. Its so far not illegal for me to ask you things and for you to disclose them to me even if you weren't supposed to. I'm allowed to lie to you even to persuade you to tell me things.

This absolutely would be illegal and I'm not sure why you think otherwise. Misrepresenting yourself in order for me to reveal to you private information is fraud and is illegal in pretty much every jurisdiction I can think of.


> You didn't "ask my server". You used a tool to extract data from my server.

The tool asked the server. The server replied.

> It's more akin to you standing just outside my property border and using a fishing pole to pull fish

Bullshit. Using HTTP to access public information is akin to standing outside your business and writing down the phone number in the banner. Or even reading the "No trespassing" sign.

As long as you're not violating copyright, NDAs or EULAs (and that's debatable) there should be nothing wrong with reading information that you were authorized to view.


>here should be nothing wrong with reading information that you were authorized to view.

You aren't authorized to view it. That's the entire point.

And the lack of access control does not implicitly give you authorization to view it.


When it comes to physical properties there's a huge difference between reading a banner posted in a street and entering the property to read some secret data: you have to be in different locations. That's why your analogy is completely faulty.

When it comes to PUBLIC data in a website there's no difference. How would I know I'm authorized, implicitly or explicitly, to access a website, say www.google.com? Should I phone the domain owner before accessing?

Just because you meant for something to be off limits but failed to inform anyone doesn't automatically make it off limits. "Trespassing" in a website is analogous to hacking it, using stolen credentials, using exploits and things like that.

Unless some law passes that says that someone remotely accessing a folder called /secrets/, or /inside-the-property/ or something like that is trespassing, it won't be the case.


>When it comes to physical properties there's a huge difference between reading a banner posted in a street and entering the property to read some secret data: you have to be in different locations. That's why your analogy is completely faulty.

At no point is accessing a web server similar in any matter to reading words off of a banner posted in a street. You cannot use a faulty analogy of your own to describe why my analogy is faulty.

>When it comes to PUBLIC data in a website there's no difference.

Yes there is. Even for data that is public and meant to be accessed to the public, you still must access the web server. It is much more similar to walking into a publicly accessible restaurant and reading their menu, it is not similar to reading a banner on the outside of the restaurant.

>How would I know I'm authorized, implicitly or explicitly, to access a website, say www.google.com? Should I phone the domain owner before accessing?

A reasonable person knows that www.google.com is meant for public use. It is common knowledge and from whatever avenue you heard about Google, you probably gathered from context that www.google.com is somewhere you are allowed to go.

This is absolutely not the case if you randomly guess a URL like 'mycompany. intranet. io/financials /employeelist. xls'. And it certainly is not the case when you are explicitly told (such as in a robots.txt) that you are not allowed.

>Just because you meant for something to be off limits but failed to inform anyone doesn't automatically make it off limits.

It does, though. The owner of property is under no responsibility to inform the public that their property isn't meant for use. It is up to each individual person to determine if they are allowed to use it or not. This is typically done by context clues and societal expectations: it would be absurd for a random member of the public to walk through someone's open front door and claim "well I was never explicitly told to not come into your house...". The person should know, based on social conventions that you don't just walk into someone else's house, that it's not allowed. This is the same for websites. There is some leeway given, such as if you saw a sign for "Open House" and simply walked into the wrong house. But it is still possible to commit an act of trespassing even if you didn't explicit intend to: this is called negligent trespassing.

>"Trespassing" in a website is analogous to hacking it, using stolen credentials, using exploits and things like that.

No, it's not. Did you even click on the link I provided earlier regarding trespassing?

>Unless some law passes that says that someone remotely accessing a folder called /secrets/, or /inside-the-property/ or something like that is trespassing, it won't be the case.

That law already exists. It's called the CFAA, and the debate around it is what is being discussed in this post.


The "don't walk into someone else's house" rule applies to ALL houses everywhere. You are explicitly forbidden to enter a house unless explicitly authorized.

When it comes to website, there are billions of domains in the planet, each one has multiple internal URLs, ranging from tens to several million. You can't expect everyone to have common knowledge about every domain and link. It is beyond ridiculous to compare the two.


> You can't expect everyone to have common knowledge about every domain and link. It is beyond ridiculous to compare the two.

It's true that there's a presumption that sites that are accessible by the public are open for access to the public. But a lack of technical restriction is not an invitation. If a reasonable person would conclude that your access is not welcome then your access is also illegal. this is the crux of why so much of security research is on precarious legal footing. If you find an unsecured mongoDB database with a name like "customer_data" and you download the contents you are 100% breaking the law.


A better analogy: Accessing a website is like calling up a business and asking whichever employee answers for information.


> And the lack of access control does not implicitly give you authorization to view it.

I know you're trying really hard to sway opinion on HN for some reason, but I'm just going to reinforce the entire point of this thread and, assuming we're staying within the context of publicly accessible information: the Ninth Circuit Court strongly disagrees with you.

Common law torts, such as trespass to chattels, may apply. But it's not a criminal offense.


I don't know why you think this has anything to do with opinion. I'm relaying information that is available in the Wikipedia link that I provided in an earlier comment.

>but I'm just going to reinforce the entire point of this thread

That isn't the entire point of this thread, nor is it the point of the PDF posted in the OP.

>Common law torts, such as trespass to chattels, may apply. But it's not a criminal offense.

Nobody has said anything about it being a criminal offense. The relation to trespassing is literally the entire point of this thread.


> You didn't "ask my server". You used a tool to extract data from my server.

You're always using a "tool" to "extract" data from a web server, unless you're manually operating a telnet session. A web browser is such tool, an incredibly complex and automated one. cURL is such a tool too, and so is cURL wrapped in a bash script. None of them go outside of what's allowed by HTTP protocol[0]. And the most core assumptions of the Internet and HTTP protocol combine into a simple rule: if it's a publicly routable server answering to HTTP requests, you can issue requests and receive whatever it sends. If a server wants to discriminate, it should set up an auth scheme.

--

[0] - protocol family at this point.


> "You didn't "ask my server".

Yes, you did.

> You used a tool to extract data from my server.

No, that's not how the technology works.


Using fishing bait is just a "request" for a fish to bite my line so I can pull it in. It's up to the fish to respond to the 'request', right? So does that absolve me of a crime if I go fishing in someone else's pond and pull out all of their fish? Cause the fish are the ones that responded, right, so it's not my fault?

No, of course not. The technical details of how an HTTP request works are not what is relevant here. Don't be obtuse.


Personal attack noted.


No, it’s more akin to standing on the boundary, reading your posters using binoculars.


um, seems in this case the court specified that it is NOT on them - if the info is public, the website/house-people are required to return it and create no obstacles to doing so.


>A website or server is property, just like land is. Accessing it is no different than accessing any other piece of property

What if I placed a sign on my lawn which said "Please, step on the grass!"? Would it still be trespassing?

You laid out a lot of opinions there as if they were facts. They are not. These issues are complex and are still being debated at levels higher than the HN comment section.


I don't understand your comment.

>What if I placed a sign on my lawn which said "Please, step on the grass!"? Would it still be trespassing?

No. Of course not. What exactly is your question?

>You laid out a lot of opinions there as if they were facts.

I didn't lay out any opinions. I relayed information that is available from Wikipedia and other sources and rephrased it into an HN comment. None of it is opinion. If you take issue with what my comment says, you can take it up with the courts that made the decisions that gave the information I posted.


>No. Of course not. What exactly is your question?

My point was that it's hardly as clear cut as a piece of land and you know it. You posted a link to W's Trespass of Chattels, which I think is funny because it exactly proves my point. From your link:

>...several companies have successfully used the tort to block certain people, usually competitors, from accessing their servers. Though courts initially endorsed a broad application of this legal theory in the electronic context, more recently other jurists have narrowed its scope. As trespass to chattels is extended further to computer networks, some fear that plaintiffs are using this cause of action to quash fair competition and to deter the exercise of free speech; consequently, critics call for the limitation of the tort to instances where the plaintiff can demonstrate actual damages.

It is not at all clear that what we're discussing here is a clear violation. It's very debatable and the law itself was never envisioned to apply to scraping websites (because they didn't exist yet!) It also goes on to say (in the US)

>One who commits a trespass to a chattel is subject to liability to the possessor of the chattel if, but only if,

>(a) he dispossesses the other of the chattel, or

>(b) the chattel is impaired as to its condition, quality, or value, or

>(c) the possessor is deprived of the use of the chattel for a substantial time, or

>(d) bodily harm is caused to the possessor, or harm is caused to some person or thing in which the possessor has a legally protected interest.

The only clause there which even begins to help your case is the 'value' part of clause b and, again, that's very debatable.

> you can take it up with the courts that made the decisions that gave the information I posted.

Decisions made by court A get overturned by court B all of the time. We'll see where it lands, but we're not there yet (again, my point!)


Those are apple to orange comparisons: A phone call - you don't have to answer the call, nor say anything once you know (or don't know) who the caller is or what their intention is - you can stop whenever you want - is it a robo-call? You hangup. And similarly with talking to someone (in person - if they say something you're free to just not respond; and if they persist, it's harassment.

The main reason I see businesses being concerned about being required to serve scrapers pages (even at a reasonable rate of download) is that there's still cost associated to it, and more so the more scrapers try to access and regularly access the data for updates. Similarly, if it is the users of a platform who have input the data, update it, and they are only wanting it presented on that platform (for whatever reasons) then what rights do they have?

Is the answer then requiring adding another acknowledgement message like "this site uses cookies" required, perhaps with required response before moving forward to have users acknowledge "scraping isn't allowed" - akin to "no trespassing" signs on properties? That seems awfully ridiculous to put the onus on 100% of users (including the overwhelming majority being non-scrapers), adding friction and speed of access to billions of internet surfers? Of course browsers could then could act as a layer that auto-respond to that or pre-agree to the rules - perhaps in a way reading through a site's TOS and pre-approving what you agree to. And as the trend has been otherwise it leads to closed platforms so the data isn't considered public; I won't argue whether that is good or bad for the general internet, however how much value is there in a person having access to that data without having to be a user?

Or the much simpler thing is we could put the onus on businesses who are scraping or will use scraped data to not cause this mass friction.


And does robots.txt count as an access control?

What about a humans.txt that says "please don't scrape this site"?


According to the ruling, a cease and desist letter directly demanding that they not scrape the site didn't count as access control, so one would assume that humans.txt wouldn't either. It needs to be a technical prevention like a password, access token, etc.


Sounds more like an access suggestion than control to me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: