Hacker News new | past | comments | ask | show | jobs | submit login

I really don’t want my cloud storage provider to check my files for copyright violations. It’s not required legally and it’s something that I think is anti-user.

Banks don’t have to check safe deposit boxes for stolen art.

Self storage don’t have to check containers for stolen goods.

Of course, I’m against illegal stuff, but I don’t want to waste a single second defending myself from false positives in these situations. Google has no way of knowing whether I own the IP so I could have paid for a license for the material on my drive or many other legitimate cases.




Why does Google Drive perform such scans?

Is it because you might share the files with someone that they have to consider anything put on Google drive as being redistributed under copyright law, and thus subject to copyright restrictions?

Or is the very act of putting something in cloud storage considered redistribution under copyright law, even if the file is never shared and you are the only user?

A while ago, I backed up a bunch of my Mom's files from a failing computer of hers onto Google Drive. I didn't think anything of it at the time. If there are some copyrighted materials on there, is Google going to suddenly terminate my account after a retroactive scan?

I think very hard about copyright and ensure that I uphold copyright in all my public works — for example, there are licensing details at the end of all my slide presentations for all images. Having to apply such a level of care to every action I perform on Google services is bonkers.


Bonkers and potentially useless,

Google does automated scans, they don't care if something is properly attributed, falls under free use or you having bought an license.

They also are not really known to care about fixing false bans to individuals, sometimes they do, other times you are screwed.

They also might lock you out of all your google services, email, storage, domains, apps you bought, videos you bought on YT etc. Which tbh. is the most bonkers thing and should not be legal.


Pretty sure the problem is some users have Google Drive's with GBs/TBs of pirated movies, etc, that are shared massively.

>Is it because you might share the files with someone that they have to consider anything put on Google drive as being redistributed under copyright law, and thus subject to copyright restrictions?

Yes


This seems like a fundamental design flaw for any cloud storage service which enables sharing. It would be a problem not only for Google, but for Dropbox, etc.

There needs to be a sharp distinction between files for your own use and files that are shared with the world — as in a global setting for whole volumes to disable global sharing and thus avert the need for copyright scanning.

If by making files easy to share at any moment, the service creates a need to perform continuous copyright scanning of all drive content and then to take punitive action when it is detected, then the service is really nothing like a private hard drive. The potential for catastrophic loss, not only of the drive contents but of everything you access through Google services, is much more terrifying than the possibility of corrupting a local drive and much harder to plan for.


>Dropbox has adopted a policy of terminating the accounts of users who repeatedly infringe copyright or whose accounts are subject to multiple infringement allegations. If you repeatedly share files that infringe others’ copyrights, your account will be terminate

https://help.dropbox.com/accounts-billing/security/copyright...

> There needs to be a sharp distinction between files for your own use and files that are shared with the world

Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

Use a local backup if needed, use sneakernet to transfer files.


> Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

Everything is copyrighted. The comment you just wrote is automatically copyrighted. "Do not upload copyrighted materials" means you can only upload things made over a century ago (which either were made before copyright existed, or were copyrighted but their copyright expired). Want to upload your vacation photos? Too bad, they were made this year, so they are copyrighted, and will be copyrighted for several decades after you're already dead.


If our comments are copyrighted by us can HN legally refuse to remove them (like they often do)?


No, you have licensed your content to Y Combinator. It’s in the TOS:

> By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.


> Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

This is more difficult than you are making it sound. Considering the way most people use their computers, the only way to avoid copyright infringement entirely in your use of cloud storage is not to use cloud storage the way you use your local drive but instead to assess every last file that you upload as if you were publishing it on a public website.

Sure, you can avoid uploading pirated movies (perhaps by never downloading any of them in the first place). But in fact, original works are typically copyrighted by default as soon as they are created, even if the copyright is not formally registered — and we depend on a patchwork quilt of implicit and explicit licenses for viewing and use of files on the internet. Have you ever saved a quote from somewhere? That was probably copyrighted; your use is probably legitimate, but a naive algorithm might not think so.


Honestly, the situation is far worse and nuanced than that. You can not determine whether content is infringing on copy write because of content alone. Copyright depends on context.

Situation 1. You buy a PDF ebook. You have legal rights the person use of this file, but you upload it to your cloud storage and it gets flagged. You cannot determine if the intent of uploading the PDF was to facilitate pirating, or because you have a pirates copy, or because you have the rights to it.

Situation 2. You hire a wedding photographer, and they supply you with photos that you do not own the right to distribute because they retain copyright. This is the same as the above situation, but personalised. Would you like your cloud provider to delete any file, including your wedding photo backups, because it with matches a hash in a database?

Situation 3. Copyright fair use. Much has been written on the subject, but this is where copyright falls over in a digital age. Fair use is complete indistinguishable from piracy from a flagging files by content perspective.


If you modified the tags a little, wouldn't that keep the song from matching checksum of another file?


Fooling the algorithm is not a robust solution. Infringement-detection algos get better hit rates all the time (fewer false negatives, although often at the price of more false positives) — consider all the work that has gone into detecting musical duplicates even when resampled, pitch/time-shifted, rerecorded, etc.

The robust solution is to treat files which are redistributed and therefore trigger copyright provisions completely differently from files which are not redistributed.

Furthermore, for the purposes of punitive action, sharing a copyrighted file with a limited whitelist of other users might reasonably be treated as a less of a screwup than making a file accessible to the entire internet. And therefore it should not be frictionless to share a file with the internet.


> to assess every last file that you upload as if you were publishing it on a public website.

IMO, that's precisely what you should be doing.


All files are copyrighted. In fact, my .DS_Store files are generated by my computer against my content and belong to me.

Some 10x engineer at google write code that flags it, how is that my problem?


Not all files are copyrighted. For example, files which are not creative works -- including those which are mechanically created -- are not subject to copyright.


Yup, it's a common misconception that everything is copyrighted. Copyright requires creativity. An empty text file/image is not copyrightable, the pile of #include directives at the top of a C file is not copyrightable, etc. You need to create something that isn't just mechanical, regardless of whether the computer does it or you do it manually. At the point where creative decisions start to influence the process, that's where copyright comes into play.


> the pile of #include directives at the top of a C file is not copyrightable

That used to be the thought, but Google v. Oracle ended with de mininimis defenses struck down, and imports still copyrightable, just Google was afforded a fair use defense to copying.


I thought Google v. Oracle was about API structure, not imports. While I obviously don't think API structure should be copyrightable, we clearly can't extrapolate from that to a list of imports. The former is something chosen by an environment designer and needed for compatibility; the latter is just boilerplate every user of a given environment needs to write.

There is certainly creativity in API design, it's just that the right to interoperability and the utilitarian aspect should trump any copyright claim on the API itself. But there is no creativity in writing imports; you aren't making any material decisions, you're just doing something the compiler requires you to do.


To be copyrightable, you still need to have verbatim components that are copied (or 'mechanically' meaning algorithmically transformed from verbatim components). Abstract concepts like API design still can't be have copyright protections directly per se; it's a concept as a proxy of the copyright over the "declaring code" of import and export definitions. That's why I brought up how de minimis defenses were also struck down; the next direction people go is saying 'well, it's just one line to import the library, surely that's not enough'. The fair use defense afforded to Google for a very abstract interpretation of interoperability is sort of the last bastion we're left at the moment.

But as someone who's a big fan of your work, I'd implore you to not trust myself or your knowledge on this and hit up a lawyer. You're close enough to the edge of legality with a lot of your work that I'd hate to see you stifled by a minor misunderstanding here that could have been avoided. Google v. Oracle ended better than it could have, but AFAICT still heavily complicated RE work and independent implementations. It made a lot of this murkier, and being at least internally consistent with a legal theory here could make a bad situation at least a little better by leaving you with more options.


> But as someone who's a big fan of your work, I'd implore you to not trust myself or your knowledge on this and hit up a lawyer. You're close enough to the edge of legality with a lot of your work that I'd hate to see you stifled by a minor misunderstanding here that could have been avoided.

I do not retain a lawyer personally, but I inform myself of legal opinions around this field. It's why I felt comfortable enough to write this:

https://asahilinux.org/copyright/

Ultimately though, once you stay clear of obviously problematic actions, the question of whether you're going to get in trouble boils down to whether the company you're up against is evil, for better or for worse. Given that Apple isn't going around suing jailbreakers and Hackintoshers, I'm not too worried that they'll go after us as long as we don't do anything stupid.

Conversely, I got frivolously sued by Sony for talking about a security vulnerability in the PS3... and yeah, I had to get a lawyer for that one.

In the end, once you get yourself deep enough into legal analysis around these subjects, you come to the conclusion that everyone violates copyright in little ways, all the time, and the world would grind to a halt if we stopped. The system is broken and relies on the goodwill of the people participating to not completely collapse. For example, I've previously mentioned how copying most example code you find online, e.g. in places like Stack Overflow, is a copyright violation unless you adhere strictly to the license (did you know SO content is licensed under CC-BY-SA?). Posting third party code snippets to most services, e.g. Twitter, is a copyright violation due to incompatibility between the license and the ToS requirements of the site. And so on.


As far as I remember the SCotUS verdict was "we don't want to say if it's copyrightable or not, but if it were copyritable it would fall under fair use".


And the appeals court said that it was copyrightable, so by not saying anything the supreme court let the appeals court ruling stand on that point.


That’s an interesting detail, thanks.


Sort of. Works mechanically derived from copyrightable works are still copyrightable.


> Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

Do not store a single "1" in a file, too: https://news.ycombinator.com/item?id=30060405


> Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

As the article illustrates, it’s not actually that simple. Cloud services that terminate accounts (and probably instantly delete everything, to comply with GDPR, CCPA, etc.) for perceived copyright infringements will always and necessarily suffer from a false positive rate.

We’ll likely never truly learn what this false positive rate is, but that it will always exist is reason enough to give pause to the thought that services should “just terminate their account” if they think it’s infringing on intellectual property laws.

The only good answer here is an unqualified “Use a local backup”. The terms of use for non-business cloud storage absolve the providers of all responsibility for data loss, even when they have incorrectly taken punitive action against you.


> Even simpler: do not upload copyrighted materials to Dropbox, Google Drive, etc.

it is absurd for copyright to prevent a legitimate user from storing media on a cloud filesystem.


What if you are authorized to have said copyrighted works?


How is this a problem though? So what if I pirated movies in my cloud storage?

Should my HDD manufactures also scan for pirated movies?

Should samsonite suitcases check for pirated physical dvds?

Should my smarttv check for pirated movies?

Google isn’t even responsible legally, they are choosing to do this.


If copyright holders had their way, smart TVs would definitely block playing pirated media. To add to your list, why should HDMI care about encryption and stopping piracy?


From the comment that you are replying to:

> That are shared massively

If the drive is shared, then google is shipping the bits to whoever it is shared with, same as YouTube.


> Pretty sure the problem is some users have Google Drive's with GBs/TBs of pirated movies, etc, that are shared massively.

That's Google's doing and problem. It doesn't justify creeping around all users' files.


Google is not liable for this use of Google Drive in any way.

Under 17 U.S.C. §§ 512 (also known as the Safe Harbor provision) Google is not liable for what users of the service upload and share as long as Google complies with take-down requests from rights holders. This behavior from Google goes way beyond what is legally required.

Under the laws, Google could also scan the content for "potentially libelous" material but this also would not be legally required. Google has no legal responsibility to scan your content for possibly infringing material.


It could be in their agreements to sell movies on google play, youtube, etc. e.g. A financial reason to make the MPAA happy.


Safe Harbour provision does not exist in some of the jurisdictions Google operates.


This is happening in the USA as well.


I would think that all these "copyright" scans Google performs are not performed against a list of known-infringing files that Google researchers compiled themselves by monitoring pirate websites. Instead the known-infringing list would be compiled from previous takedown requests. And one of these takedown requests either explicitly or implicitly (e.g. by listing a "folder") contained a .DS_Store file that looked like many other .DS_Store files because the user had not modified the folder (display) attributes on their mac, which was then added to the known-infringing list, and which then created this mess.

But it's valid to question whether Google has to scan all files for known-infringing files in the first place. That's really where it gets tricky, legally. On the surface, they absolutely do NOT have to perform such scans under the DMCA.

But then there are provisions in 17 US § 512 (aka the DMCA law) that state for example:

"A service provider shall not be liable [...], if the service provider

(i) does not have actual knowledge that the material or an activity using the material on the system or network is infringing;

(ii) in the absence of such actual knowledge, is not aware of facts or circumstances from which infringing activity is apparent; or

(iii) upon obtaining such knowledge or awareness, acts expeditiously to remove, or disable access to, the material;"

This is very vague. It wouldn't be hard to imagine that some lawyers could show up claiming that because Google received a valid takedown notification for a specific file known to be a "pirate" release of some movie, that Google should have known or at least "been aware of facts or circumstances" that all copies of the same file in whatever user accounts are infringing (which might not be the case thanks to fair use, but lawyers and most juries would not care). If they could furthermore demonstrate that Google does already have knowledge required to locate each and every copy of a file in Google Drive accounts easily (e.g. find out through discovery that Google Drive "deduplicates" storage), then it would be game over with most juries and Google's safe harbor in the case gets denied and they are found liable.

And that's only the US (DMCA) aspect of it. The German Bundesgerichtshof (Federal Court of Justice, highest court of ordinary justice) for example has found in the past[0] that service providers can be liable if they have been previously informed about copyright infringement and did not take "reasonable" steps to prevent further infringement, and that these "reasonable" steps may specifically include checking new uploads and existing files against a list of known-infringing files (or hashes thereof).

Yeah, it sucks that Google and other service providers scan files that way, even if these files are never shared, and it sucks even more when somebody makes a mistake and puts a benign files on the known-infringing list (which is something Google should then correct, apologize for and reset any account flag/reinstate any banned accounts that got in the crossfire due to Google's mistake), but I can also appreciate that law makers and courts around the world have put Google into a situation where Google defacto (if not dejure) has to perform such scans to avoid liability.

[0] In a case involving where Atari sued then-filehoster rapidshare over "Alone in the Dark", in 2012.


Google built the product without fully considering that problem, and now regular users are paying the price.

In any case, it shouldn't be difficult for the product to make a distinction between files openly available and files being used privately within the drive on accounts that should look very legitimate to Google. The copyright filter would make a bit more sense on openly accessible files. Even then, why is Google going so far beyond what's legally required of them?


Okay, easy fix. Check the files for copyright if you create a sharing link for them.


> Okay, easy fix.

Assessing copyright implications is not "easy". It's a lot of difficult work that involves specialized expertise, judgment calls, and risk assessment. There are complex areas and shades of grey: derivative works, fair use, copyrighted but licensed materials, etc.

The main thing you are trying to avoid is causing harm at a level where an infringement claim is justified. There are a lot of uses which might look like infringement to an algorithm but which are completely legitimate.

> Check the files for copyright if you create a sharing link for them.

I just ran through everything on my Google Drive. Thank goodness I don't use it for much, though I do pay for extra storage. I don't have anything shared with the world, and only have a few files shared with a handful of family members.

But will this protect me? What is Google's policy with regards to scanning — do they scan only shared files, or do they scan the full drive because content might become shared?


I honestly wonder if that is what is happening. This article wasn’t able to replicate, but perhaps making a public link to the file containing .DS_store would do it


We already know that Google is scanning all of the files in your account looking for kiddie porn, and has been for the past decade.

>a man [was] arrested on child pornography charges, after Google tipped off authorities about illegal images found in the Houston suspect's Gmail account

https://techcrunch.com/2014/08/06/why-the-gmail-scan-that-le...


I license a photo for my website, upload to google cloud and create a sharing link to share with my web designer and the contractor responsible for the website. Did I just infringe copyright?


What about fair uses? A company logo in a presentation? Some data from a public source that includes a license you intend to abide by?


> Why does Google Drive perform such scans? A lot of warez and piracy websites used Drive to share that content, but this kind of filter can be easily avoided by saving the files as an encrypted zip, rar, or 7z file with a password.

In my opinion, using this kind of filters to all the files you upload is pretty useless. The download traffic of a file or an account gives more information about some files used publicly than any other metric.


>Is it because you might share the files with someone that they have to consider anything put on Google drive as being redistributed under copyright law, and thus subject to copyright restrictions?

That is a civil matter that falls upon the copyright holder, not Google.

Google is no more guilty than the makers of VCRs were when you recorded something without permission.


As poster above says, Google has no idea if I own the rights / have a license to the articles in question.

I used to work in the music industry and had license to rip music and distribute it online from all the major labels. I don't want my cloud storage disappearing along with my Google account just because Google mistakenly thinks I'm a pirate.


> It’s not required legally

That may or may not be correct in the USA, but the world has many jurisdictions with varying laws on copyright infringement, and cloud storage providers may be liable for copyright infringement claims.

Even limiting this to the USA, we had

- Metallica vs Napster (https://en.wikipedia.org/wiki/Metallica_v._Napster,_Inc.)

- A&M Records vs Napster (https://en.wikipedia.org/wiki/A%26M_Records,_Inc._v._Napster....)

Napster lost both cases.

So, if you run a cloud provider who permits file sharing, it seems there’s a decent change you’re liable for copyright infringement by your customers.

Also, https://www.jdsupra.com/legalnews/cloud-computing-a-brief-ov... says:

Therefore, Canadian copyright law is currently unclear on whether cloud storage providers may be shielded from liability for copyright infringement

⇒ If I were to run a cloud provider who permits file sharing, I think my legal team would strongly advise to scan files _shared_with_others_ for copyright infringement.

(In the ‘.DS_Store’ case, Google’s system seems to have some embarrassing false positives, but that’s a different issue)


Napster was designed explicitly to enable copyright infringement.

Cloud storage is not.

A more valid comparison would have to involve the Supreme Court's ruling on VCRs, which could possibly be used for copyright infringement, but had substantial uses that were perfectly legal.

>The Court's 5–4 ruling to reverse the Ninth Circuit in favor of Sony hinged on the possibility that the technology in question had significant non-infringing uses, and that the plaintiffs were unable to prove otherwise.

https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Unive...


Google Drive checks for copyrighted files only if they are shared.


If you don’t want cloud storage to do this, you need to use cloud storage that doesn’t allow sharing between users.

If it allows that, people will misuse it for piracy, which will lead to this.


Or do what I do which is store my copyrighted materials encrypted in the cloud.


Would an e2ee storage be the answer to this?

As an end user I would be sure that my data is encrypted in-transit and at rest.

As a cloud provider I would take care of encryption and privacy promises and transparency and care about my bandwidth and storage costs.

Risks:

- bad cryptography/leaked keys. Mitigation: sound cryptography, open source model of development.

- all possible attacks from the public about potential usage of the service for CP, terrorism and other deadly crimes.

- the rest of the risks that apply to e2ee messaging as well.

// just off the top thoughts


Banks literally do have to perform money laundering checks on deposit boxes.


I don't believe this is correct. You can't launder money with a safe deposit box, so it makes no sense to have to conduct money laundering checks on one.

Standard rental agreements explicitly state that the bank does not retain a key and is unable to open the box (without destroying the lock) in the event that you lose yours.

For example: https://www.bankofamerica.com/content/documents/deposits/saf...

"The bank does not retain duplicate keys for any rented box".


Can you say more about this? I have never heard about this.

To my knowledge, my bank cannot access my safe deposit box without my key or without drilling and replacing the lock my key opens.

Am I mistaken about this?


They have their own copies of the keys.


Yes


Could it be an excuse to go through our data thoroughly?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: