Hacker News new | past | comments | ask | show | jobs | submit login

The question is whether the content is fair game (to access). Google has already proved it to be fair game and if anyone wants to argue otherwise, they would need to then argue with the most flagrant offender, Google, who has much more than just "Confidential" PDFs.

Google would be guilty of any charge that could be levied against someone for accessing data that Google actively provides.




I'm sorry but I think this is rather ridiculous. Google's position is that they have automatically indexed everything that the server said it could, but will remove anything and provide websites a way of doing this.

Your position would have to be that you searched for obviously confidential documents, found them and downloaded them without knowing you shouldn't.


Guys, I think we got out in the weeds a little bit with the google thing. The question is if someone puts up a web server on the internet with no authentication and no notice that it's not open for public use, can they get me for "unauthorized access" if I download content from it? If not, what makes HTTP special - why not SQL or SMB?


The relevant question is not whether there is an explicit notice, but whether common sense suggests that you are intentionally making unauthorized accesses - as would be the case with the Google search you mentioned.

See also:

https://en.wikipedia.org/wiki/Goatse_Security#AT.26T.2FiPad_...


Common sense?

If you send a valid HTTP GET to someone's server and they respond with a 200 OK and some content, the access was not unauthorized. The HTTP protocol actually makes authorization an explicit mechanism that may be disabled or loosened at the implementor's leisure.


To be fair, the EFF took a position in the case I linked that suggests they might agree with you in the present Google hypothetical too:

https://www.eff.org/deeplinks/2013/07/weevs-case-flawed-begi...

Not only that, I was actually surprised to find that the New Jersey court cited a state precedent along similar lines:

http://cdn.arstechnica.net/wp-content/uploads/2014/04/weevru...

->

http://caselaw.findlaw.com/nj-superior-court/1508996.html

...though that was interpreting a state law and brought up the fact that the state law has some subtle differences from the federal CFAA (despite very similar wording, quite vague in both cases).

On the other hand, in Craigslist v. 3Taps, a district judge found that simply evading an IP ban, while otherwise accessing entirely (intentionally) public information, counts as unauthorized access under the federal law. And then there's the case of Aaron Swartz.

But anyway, even under the more permissive of the possible standards, your logic is too simplistic. What if I send a HTTP GET like this?

    GET /viewarticle.php?title=x%27%20UNION%20ALL%20SELECT%20%2A%20FROM%20%27users HTTP/1.1
It's a perfectly valid and well-formed request according to the HTTP standard, and even valid at the application level, in the sense that you technically can't rule out that an article might exist titled "x' UNION ALL SELECT * FROM 'users", and a correctly written server-side script would interpret the request simply as searching for such an article. But suppose the script isn't correct, and instead of showing an article dumps its user table. Would you say that my access to user data is authorized?

Well, I actually don't know how you'd answer the previous question, but I strongly doubt any court would answer yes. If you say no, then the implication follows that either the difficulty of constructing the dubious request, or perhaps the intent, or something else relatively wishy-washy and subjective can make the difference between authorized and unauthorized. It can't be reduced to some strict technical standard.


If the script is mixing title comments into executed SQL code, then I don't think there is much hope for it. This line of argument allows post facto rationalization for determining unauthorized access. To make a claim that something was unauthorized is to claim that there is some procedure that can determine whether something is authorized or not. That procedure is the thing that should actually be executed when deciding to serve a request. We are talking about cases where the written procedure says the request was authorized, but someone else claims that the actual procedure gives a different result [insert ad-hoc, post-facto rationalization here (ie. not policy)].

This is clearly nonsense, though it may take some time for courts to figure it out.


> This line of argument allows post facto rationalization for determining unauthorized access.

Ah, so the burglar with the bump key is allowed in because the action of the lock determines criminality? "If it opens it's allowed?"

You seem to be making the same fundamental mistake many technical individuals make when they interact with things outside of their knowledge sphere - you're attempting to map a space that is foreign to you into the world you know.

The legal system is not a computer. It does not run on rigid rules That's actually a really good thing: it allows flexibility in considering whether an action is a crime or not.

There's a spectrum to consider. It's clear on one end that a person who searches for "not for release filetype:pdf" may be looking for historical documents, and a person who attempts a SQL injection against a web application has sufficient guilty knowledge and intent.


The legal system does run on rigid rules. Yes, there is no perfect executor (subjectivity will still exist), but the rule of logic still applies. A legal system where you may be convicted of a crime on a whim is not a legal system, it is a farce.

Everyone seems to be ignoring that a 200 OK is explicit authorization, per the protocol. It would be one thing if we were talking about a protocol with no built in authorization primitive, but we aren't. Using HTTP establishes an authorization procedure. Claiming that it may be illegal to receive responses to well-formed requests to the server requires one to make the fundamental mistake of not understanding the technical protocols that are being used to communicate.

The legal system operates on a subset of the logic involved in the technical world. Its ideas and understanding will necessarily lag the reality being created and will be subservient to the logic being established, not adversarial.

Burglary is a crime because it is an intent to commit further crime, not because a door was opened. The difference with an HTTP authorization lock is that the authorizor gets to examine every request and must run their authorization policy on every one. Arguing that the policy that was actually ran was "wrong" is an admission of incompetence.

The analogous situation is where a business posts an "OPEN 24/7" sign by their open front door, but shootgun blasts people who walk through the door.


That's a good point. 401 Unauthorized... They even used the right word.


Documents are not obviously confidential if there is an established process for removing confidential documents, but the documents still show up in a simple search.

Your position is that you viewed everything that Google thought it could publish in regard to your query. It is ridiculous that someone could be jailed as a result of clicking a link on a Google search result page.


Consider the Google search that started this:

"not for public release filetype:pdf"

That's a pretty flagrant attempt at accessing confidential documents. It isn't like someone googles "how to catch a roadrunner" and accidentally downloads confidential Acme documents. This is a full on attempt to find poorly secured documents.

Now, consider what Google does. It runs bots (that respect things like robots.txt) and then publish links to everything that they can find.

Maybe I'm missing some subtlety, but I don't understand how these are similar. Can you explain yourself further?


That is a perfectly legitimate query. I would expect to find all manner of historical documents. Further, it does not matter what a document says. Claiming to be not for public release doesn't make it a crime to release it. The only possible exception here is for national secrets, but even then many exceptions have been made.


Good answer - thanks very much for clarifying!


Because there isn't going to be anything confidential that the search result returns. And anything you access is something that was widely available.

It'd be like googling, "Bank of America's Secret Backdoor Password to steal all it's money".


It's possible that I have missed some subtleties in your argument so let me ask for a bit of clarification.

Because there isn't going to be anything confidential that the search result returns.

Doesn't this assume that sysadmins are actually competent? And isn't there a ton of evidence that suggests that sysadmins have routinely allowed confidential data to be indexed by Google??

In that case, isn't this analogous to what would happen if I left my front door unlocked and you 'broke' in and stole my collection of Taylor Swift CDs. (I don't actually own any Taylor Swift CDs, but it makes my point easier).

Granted, I did a shitty job of securing my valuable music collection, and Taylor Swift CDs are widely available. But fundamentally, you still came in without permission and took something that belonged to me.

Recent history has shown that you can be prosecuted for all sorts of things in cyberspace. Accessing confidential directories, downloading poorly secured files, and exploiting poorly designed APIs have all been successfully prosecuted.

I wish that we lived in a world where doing things like that would be considered a part of intellectual freedom, but the unfortunate truth is that laws are applied in such a way as to make this highly risky. The silly thing is that the state of the law actually benefits hard core criminals...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: