“Google just started mass banning/limiting Archive Team downloads”

_0nac · on April 1, 2019

According to ArchiveTeam's own tracker, there was a temporary dip around 1 AM PT (when this tweet was posted), but the speed has recovered and they are again crunching through 100k+ items/hour.

http://tracker.archiveteam.org/googleplus/

(the graphs show up in the pink block at the bottom, which can take a while to render)

Where is the graph in the tweet from? If it's just measuring successful downloads, how did it conclude that it's Google at fault? Is there another tracker of download failures that shows quota errors/DOS blocking etc?

dredmorbius · on April 1, 2019

I've been in the ArchiveTeam's IRC channel for ... the past three months or so. This wasn't just a temporary dip.

Workarounds are applied, but Google not fighting us would be a Really Nice Thing presently.

(Google's performance throughout this episode has been ... poor.)

The tracker is also only very partial, showing about 1/50th of all activity at any one time.

A better picture is at the Grafana tracker:

https://atdash.meo.ws/d/BQbN9QEiz/archive-team-tracker-chart...

Dylan16807 · on April 1, 2019

Is there anything that says how many items haven't been added to the download queue yet? Or a rough sense of the overall percentage somewhere that I'm missing?

Edit: I see the post a few days ago said about 80%, that's at least something.

dredmorbius · on April 1, 2019

88% complete, on target to hit 92% of targets.

dredmorbius · on April 4, 2019

And actually hit 98.5%

Operyl · on April 1, 2019

They found a workaround: https://twitter.com/jrwr/status/1112511926557396992?s=21

andrewstuart · on April 1, 2019

The front page of Hacker News - the only way to get your support issue with (insert big tech company name here) fixed.

As I keep suggesting, these big companies each need their own internal ombudsman, because at the moment they provide no way to fix things when their systems have gone crazy.

The other reason (big companies X/Y/Z) need to each have an internal ombudsman is that it's getting tedious reading on Hacker News about how (big company X/Y/Z) has done something crazy and the user/developer/customer cannot get it fixed.

Maybe its an entrepreneurial opportunity - someone could make a website that pays a fee to someone with Google/Microsoft/Amazon who is willing to use their influence to solve a crazy problem. "Rent a manager friend at Google".

burtonator · on April 1, 2019

You just gave me an idea actually...

I don't have time to implement it but if you like it go ahead and steal it.

Basically a Hacker News style-site for complaining about companies that don't fix their problems or own their support issues and their user base can upvote them to embarrass them into actually fixing the problem.

bsparker · on April 1, 2019

I’m building this at the moment :)

dcow · on April 1, 2019

Doesn’t this exist?

theoh · on April 1, 2019

"Get Satisfaction" partly fits this description. https://en.wikipedia.org/wiki/Get_Satisfaction

TheSpiciestDev · on April 1, 2019

This is available ;)

thingsarebadat.com

totorokun · on April 1, 2019

Doesn't work :/

TheSpiciestDev · on April 2, 2019

Whoops - what I meant was that the domain name is available!

pishpash · on April 1, 2019

They're not embarrassed, that's the problem, or embarrassment is just not enough. Usually money talks but "take your money elsewhere" also doesn't work when it's free.

A1kmm · on April 1, 2019

"Take your $thingOfValue elsewhere" still works, it is just that when thingOfValue == "eye balls", the number of people that need to do it before the cost of the loss of those eye balls outweighs the cost of doing something about it is relatively high.

ehsankia · on April 1, 2019

It's hard to know, because in the majority of these cases, people don't even actually bother trying to contact Google and go straight for the outrage post.

It's Sunday afternoon, the much more likely explanation is that it's an automated system, or something broke.

pishpash · on April 1, 2019

Where do you "contact Google" such that you don't get to play with a bot for free over the course of several weeks?

dredmorbius · on April 2, 2019

Here's what going through channels (such as haven't been shut down) is like:

“Google’s attention is not required. Everything is working as intended” as Google blocks Archive Team

https://old.reddit.com/r/plexodus/comments/b87hpi/googles_at...

mxd3 · on April 1, 2019

The Twitter OP is just sharing something he’s found. And now it’s front page on HN. Most likely there will be a response/fix tomorrow morning. A much faster response here then through traditional channels (Emailing Google). I’m thankful we have this format, one of the good things that’s come of social media.

dredmorbius · on April 1, 2019

I've been part of a 4500+ member group of Google+ users trying to get any sort of response or assistance out of Google since the G+ shutdown was announced. That's a large set of the core G+ userbase of ~10 - 100k users, with probably 10-20k as the core cohort.[1]

https://old.reddit.com/r/plexodus/comments/9ulvr0/dear_googl...

That was submitted to multiple channels on G+, to the Google Press email address, and through numerous Googler contacts. No response (though many of the issues were, and often only in the past few weeks and days, addressed, though others never were).

It's been exceedingly difficult to get any response, and that's included what's effectively been a war against us by much of the Google+ Help community moderators. Google shut down the Communities on Google+ itself for Community owners and moderators, in one case for six weeks with no notice at all, in another, with at best a few hours' notice.

https://old.reddit.com/r/plexodus/comments/aev39g/google_shu...

The details of this sort of behaviour tend toward petty and tedious, but that's precisely what's been seen:

https://old.reddit.com/r/plexodus/comments/9wi6sg/a_curiosit...

https://old.reddit.com/r/plexodus/comments/agm2j6/banned_fro...

https://imgur.com/a/5Sq0aAf

Google didn't even bother putting shutdown notice on their G+ app and About pages (the latter still doesn't say anything):

https://old.reddit.com/r/plexodus/comments/ae5lsv/failure_of...

________________________________

Notes:

1. I've made (or inspired) several estimates of total Google+ user activity, beginning in 2015 using the services own sitemaps to show that only 9% of profiles were active at all and that a fraction of a percent could be considered regular public posters. That's not all Google+ activity, but it's a very, very large share of it.

Stone Temple Consulting's follow-up study, based on my methods, looking at 500,000 profiles, gives a pretty solid view, and showed 50-100k highly active posters. There's a fair bit of churn at the top due to spam accounts:

https://www.stonetemple.com/real-numbers-for-the-activity-on...

Further information:

A Change.org petition has 38,580 signatures presently: https://www.change.org/p/google-inc-don-t-shut-down-google-p...

The Pluspora pod of Diaspora has 12,000 members.

Other signs of activity (and expanding to include non-English native speakers) probably doubles the petition and Pluspora numbers (largely English-speaking, at least as a second language).

duxup · on April 1, 2019

>talk to Google

Is that even a thing?

Google doesn't seem to want to be talk'n to outside of my information and in some cases credit card.

dmitrygr · on April 1, 2019

Historically if Google wrongs you, the only ways to get it fixed are:

1. Get lucky and some googler sees your official bug report

2. Know a googler who can file a bug internally

3. Make noise on social media till someone at Google notices

joecool1029 · on April 1, 2019

Oh, you forgot the best way to get a response! Spam emails to common names @google.com.

mherdeg · on April 1, 2019

Do you have any firsthand experience trying this strategy? This sounds kind of genius.

joecool1029 · on April 1, 2019

No, I learned about the tactic from a security researcher's blog. Google wasn't responding (as usual) so he started to poke common names and finally got a response.

We have our own internal list in case we have a major issue. It's just something we keep in the toolchest that's cheaper (and faster!) than poking the lawyer or relying on a blog post to go viral. They don't want to act like a normal company with regular support and escalation channels, so we have no intention of treating them like a normal company.

rohan1024 · on April 1, 2019

I've uploaded a lot of data to Google(mostly photos) and someday if Google decides that I'm in violation of their policy and chooses to lock me out of system I'm screwed.

I need to start thinking about proper storage system for my pictures and other data.

torgian · on April 1, 2019

I use hard drives.

dredmorbius · on April 1, 2019

One of the initiatives to come out of this is to try to raise awareness, and tool availability, for doing just this.

Though local control of your own data, plus reliable and secure off-site replication, helps a lot.

I'm strongly partial to git-based publishing tools -- SSGs and markdown hosted on the site of my choice, though GitLab and GitHub are both convenient options.

The fact that everything is in Git makes rehoming vastly simpler. And I've been bit a few times too many by service shutdowns, with about three consequential projects still in process (the earliest, Readability's shutdown, is over 2 years ago, and I'm still digging out).

0815test · on April 1, 2019

To be fair, Google does allow access to your own data via Google Takeout.

kg · on April 1, 2019

Also keep in mind that most Googlers can't change anything anyway

smt88 · on April 1, 2019

This is, frustratingly, false. They have access to internal customer support that you can't even pay for as an external user. It's fucking infuriating.

My brother is disabled and hadn't logged into his Gmail account for a few months.

I helped him log in, which somehow triggered some kind of security system. It thought we were hacking the account. We got stuck in an infinite loop of "verify your 2FA, answer personal questions, wait for customer support to review the situation, repeat".

It was hellish. We tried over and over again for months to get back in. I still even had access to the Gmail account that sent him the original invite. But when no human will review the situation, you can't break out of the loop.

The only way we got his email account back was by talking to one of my friends who works at Google.

q3k · on April 1, 2019

There's an internal form for accelerating account recovery. There's no 'access to internal customer support'. And it would be bad karma to create (internally public) bug reports with PII (like someone's email address).

frutiger · on April 1, 2019

Were you (or your brother) paying for his Gmail service?

Dylan16807 · on April 1, 2019

It doesn't matter. We already know google will lock out paid accounts without the ability to talk to a human.

frutiger · on April 1, 2019

It does matter.

Expecting something for free is exactly what leads to being ghosted, whether by a person (see the lament of the GNOME Calendar maintainer that was up here earlier) or a company.

mostlysimilar · on April 1, 2019

It's "free" in the sense that we aren't giving them money directly, but they make their billions by relying on us to use these services. Implying users who aren't entering a credit card number are of no value to Google is either disingenuous or short sighted.

Beyond just that point I think a critical dependence on these services, mixed with a general tech illiteracy and an uncaring corporation is a dangerous situation to be in. One could argue that it's the user who is responsible for maintaining access to their email account, but the reality is that the vast majority of users are not educated in these areas and absolutely need help. A corporation that won't help them use the services they provide deserves to be replaced by one that will -- unfortunately, Google is so dominant that the chance of someone seeking out and finding ProtonMail or rolling their own server is tiny. It's hard to compete with Gmail, even when you have options like ProtonMail which provide excellent customer service (with phone numbers!)

I'm just thinking out loud. It seems like a problem to me. Dunno how to solve it.

macintux · on April 1, 2019

I’m confused. What did his brother expect other than adequate user support?

If you can’t support a service, don’t offer it.

Dylan16807 · on April 1, 2019

They do the same thing to paid accounts.

How can I be clearer.

frutiger · on April 1, 2019

Providing some data comparing relative rates of such incidents of paid vs. free customers would be a start.

Repeating something does not make it clearer nor more convincing.

Dylan16807 · on April 1, 2019

The claim is that you can't even pay for good support.

The relative badness between paid and free is irrelevant.

Have you honestly never seen one of the stories about how badly google treats paying customers, and specifically how automated it all is? I can do the job of searching that if you need me to, but I'm not going to argue about free vs. paid because that's not the topic at hand.

frutiger · on April 1, 2019

You went from “how can I be clearer” to deciding my criteria for clarity are not relevant in this discussion because we are not arguing free vs. paid. I thought that was exactly what we were discussing based on my original question.

Furthermore, sporadic blog posts/tweets here and there are not indicative of how the vast majority of users feel about this service. Especially paid users who generally are too busy adding value to complain.

It’s worth remembering what email was like before Gmail came around. With their spam analysis they basically fixed email and moved the world towards a simpler web ui for email (as opposed to hotmail etc. of the day).

Dylan16807 · on April 1, 2019

> You went from “how can I be clearer” to deciding my criteria for clarity are not relevant in this discussion because we are not arguing free vs. paid. I thought that was exactly what we were discussing based on my original question.

You were asking about free vs. paid in a way that would provide no value to the discussion except to discredit a particular anecdote. But that's a distraction from the real problem, the inability to get good support, and that anecdote was just an example and not necessary as proof.

I say your criteria are irrelevant because the answer doesn't change the root complaint at all.

> Especially paid users who generally are too busy adding value to complain.

Adding value to what? I have doubt in that argument for email which usually costs a couple dollars a month. It's not a thousand dollar product that excludes amateurs. And the problem is not the typical user, it's the one that gets banned suddenly. If that happens the more value you're making by doing business over email, the worse it gets!

And none of your points about gmail being useful are relevant to support either.

craftinator · on April 1, 2019

"Especially paid users who generally are too busy adding value to complain." Sooooo, people who are too busy paying Google for services, while also being data mined for profit, to complain about being on fully automated support? You need to add a disclaimer that you work for Google.

frutiger · on April 2, 2019

Paid Gmail does not datamine per their Terms. It would be a massive legal minefield if they did secretly anyway.

I do not work for Google or any Alphabet company.

craftinator · on April 2, 2019

And based on Google's track record and extremely money-centric business style, along with a complete lack of humanity towards their users, makes me not trust their ToS. Do you?

And excuse me, your comments gave me the impression that you were.

frutiger · on April 3, 2019

I don’t know their balance sheets, but the organization I work for (~30k employees and I’m sure many other much larger ones) have enterprise G Suite accounts. Those accounts alone are worth a lot of money to Google, and I think even they would not be foolish enough to slice open the goose that lays the golden egg.

What’s more, the lack of humanity is completely unrelated to lack of legality and lack of desire for income. Both of which Google have mostly adhered to (unlike e.g. Facebook, Uber, AirBnB).

pishpash · on April 1, 2019

Maybe they should make it clear then that all Google accounts/services are disposable like used toilet paper. See how that goes over.

smt88 · on April 1, 2019

I would pay $10/mo for ad-free Gmail without hesitation. I'd pay $20 to speak to customer support.

In that particular case, I would've paid $1,000 for a Google investigator to spend 10 min verifying the account.

None of those options are available at any price.

frutiger · on April 1, 2019

You can pay $5/mo for ad free gmail (I do). You cannot pay for this $1000 special investigation option. That doesn’t exist in almost any service, I don’t know why Gmail would be any different.

Dylan16807 · on April 1, 2019

Most services will let you get 10 minutes of someone's time, and they won't even charge per incident.

smt88 · on April 1, 2019

Gmail should be different because losing access can destroy someone's life. Literally.

And every paid service I've ever used has at least email support. Gmail has nothing.

craftinator · on April 1, 2019

Google was making advertising revenue from their data profiles... We ALL pay for Google Service, even if we don't make an account.

dmitrygr · on April 1, 2019

But they can log an internal bug or send an outraged email to eng-misc@, which often helps.

dman · on April 1, 2019

[flagged]

Operyl · on April 1, 2019

Kind of in poor taste given what your employer has done to block the Archive Team (in this thread at least).

est31 · on April 1, 2019

They made it to the front page of hn. Now they have great chances of being heard.

kerng · on April 1, 2019

Best chances are if you know someone inside to get traction or you have a dedicated account manager (if you are a large company with lots of money to burn).

zxcvbn4038 · on April 1, 2019

Doesn’t matter if you are a large company Google doesn’t want to talk to you. Just like Microsoft used to write horrible code hoping that someday someone would invent a compiler good enough to optimize it, Google doesn’t want to talk to you and is hoping one day to have an AI that is good enough to speak to people.

When I was as Tumblr we were just as helpless and subject to the whims of Googlebot as a mom and pop. One day Google decided to ramp up our Googlebot rate ten times. This significant because they were already a double-digit percentage of our traffic, so going 10x placed an unexpected and worrisome strain on our backends. They only way we could get them to throttle back was through personal contacts people had with people over there. We didn’t want to tweet out to them and advertise we were under stress because it would bring out all the weirdos to ddos us (we even had to adopt rules to not talk about company holidays and offsites until after they happened because people would monitor our blogs and ramp up their bots when they thought we were out of the office)

Adopting Kubernetes at my current employer we had many discussions about going with AWS or Google Cloud. What it always came down to is AWS has support where we can write/chat/phone someone at any time and get a solution to your problem, and Google Cloud can only offer our account rep’s email address.

dredmorbius · on April 1, 2019

I'm absolutely not a fan of Microsoft.

But, some years back when I discovered problems with an online service of theirs, I cold called the switchboard, asked for the VP of the business unit, who picked up on the first ring, chatted for a few minutes, told me to expect a call from the product manager, who rang within 15 minutes, and worked with me for 6 weeks as they addressed (and I monitored) the problem, to satifaactory resolution.

Entirely unlike Google.

cannonedhamster · on April 1, 2019

We've had issues with Google and we're a peered network backbone. Literally took knowing someone to get them to fix DOSing our customer with one of their servers not liking our DNS entries. Never found out what happened either.

metildaa · on April 1, 2019

Sounds like a solid reason to null route the offending netblock or depeer entirely. If Google is hosting an abusive user/users and isn't willing to remediate the issue, there is no reason to glad hand them.

cannonedhamster · on April 1, 2019

Unfortunately this was Google DNS itself munging DNS requests on a few of their servers. Not much we could do about it.

mentat · on April 1, 2019

This is just false in many ways. Paid support for GCP is decent and if you can pay for the higher levels, even better. I've spoken with engineering teams in GCP about issues several times after support escalation.

On consumer side, if you buy Google drive you get phone, email, and chat for all questions. Google Fi is the same.

dredmorbius · on April 2, 2019

“Google’s attention is not required. Everything is working as intended” as Google blocks Archive Team

https://old.reddit.com/r/plexodus/comments/b87hpi/googles_at...

gumby · on April 1, 2019

This is the internet archive. Plenty of people there know plenty of people at google!

JeremyBanks · on April 1, 2019

Archive Team is not the Internet Archive, even though they post much of their content there. If I understand correctly, Google has blocked an Archive Team tool that was used to download data in bulk using a distributed set of volunteers. I don't think they have blocked any of the Internet Archive's own crawlers.

gumby · on April 1, 2019

thanks

metildaa · on April 1, 2019

That doesn't mean anything gets done because of said internal contacts, and you can only leverage internal contacts so many times before you've worn our your welcome. That is what I've learned from internal contacts :P

KirinDave · on April 1, 2019

It is 4pm on a Sunday at Google HQ. It seems a bit premature to start reading deep policy decisions into this.

dredmorbius · on April 1, 2019

Context:

Saving of public Google+ content at the Internet Archive's Wayback Machine by the Archive Team has begun

https://old.reddit.com/r/plexodus/comments/az285j/saving_of_...

Previously on HN:

https://news.ycombinator.com/item?id=19407865

itchyjunk · on April 1, 2019

Ah thanks, context helps. Maybe the G+ servers just have security measures to limit bandwidth usage? Do you think it's malice?

_ugfj · on April 1, 2019

No, it's just stupidity if they weren't stupid this entire effort would be unnecessary as they would be sending over the content to the Internet Archive themselves, probably along with a cheque that looks big to the IA (10 mil/yr budget) and is not even a rounding error to G (109B cash at hand).

KirinDave · on April 1, 2019

Why would that be good?

_ugfj · on April 1, 2019

Allow me to be off topic and rant about git here for a hot second. It is not as off topic as you'd think.

It has been proven again and again, over thousands of year but especially recently that the real valuable data is often in the raw notes of a research and not the published material. git allows you, very easily, to destroy history just to make the log prettier later. Instead, it should allow you to construct a hierarchy of logs and hide the nonimportant details but allow it to be shown later. Similarly, git reset --hard should not throw away work, that's criminal https://gist.github.com/chx/85db0ebed1e02ab14b1a65b6024dea29 fixes it but bah.

Back to topic: when there are hundreds of thousands of public posts, the mind boggles you are asking whether there's any value to it. There's no question some of it is valueable. I do not know how to impress this mindset where losing information created by people is henious, unthinkable.

jplayer01 · on April 1, 2019

I was part of a decently sized reverse engineering group there. A lot of valuable knowledge and interesting discussion is stored on G+ for some bizarre reason. It was never a great idea, but it's there and I suspect that there are plenty of similar groups that used G+ that simply aren't hellholes. Your anecdotal experience doesn't mean anything when there are clearly people who believe there's something there worth saving.

Dylan16807 · on April 1, 2019

Because there is a huge amount of important public posting on there. It didn't get billions of users but it had enough. Keeping up a read-only blob, or even better letting someone else do it, should be in the shutdown considerations of any major product.

xorand · on April 1, 2019

G+ had a nerdier audience and many interesting communities and posts. I don't get this ghost town line. For example HN is a niche site, with almost no audience, with almost all accounts dead. Right?

KirinDave · on April 1, 2019

I ran several G+ groups and they were destroyed over time by unfettered harassment, unstoppable spam, and just a lack of interest.

I've yet to see these things happen to HN.

dredmorbius · on April 1, 2019

HN has niche appeal, a strongly focused discussion (there's one story feed, and effectively about 30-40 stories that really register on the front page, though many more are submitted), and pretty dedicated professional moderators. As of 2015:

Roughly 2.6M views a day, 300K daily uniques, 3 to 3.5M monthly uniques. It depends on how you count, of course.

https://news.ycombinator.com/item?id=9220098

Which ... actually probably compares favourably with Google+, which had a core of about 50-100k highly active users (posting 50-100x monthly), and maybe an extended set of as much as 100 million who'd interacted with the site at one time or another significantly.

I've done a fair bit of measurement (limited by available resources and indicators), and one conclusion I'm coming to is that raw numbers do a pathetic job of indicating media or forum vitality. Most especially raw census numbers.

Looking at G+ communities, and running grid plots of engagements, it turned out that posts drove other engagement, not members, and in fact it seems as if there's some kind of fall-off (at least on a per-member basis) when a given forum gets above about 5,000 members (though I need to check this).

Google had more users. But they were spread out over a vastly larger set of forums and discussion, there was no central "square" (as with HN's "new" or "news" pages), moderation was exceedingly uneven, and often entirely absent, and there were (and remain) huge barriers for like minds to come together.

HN's overall focus is fairly (but not excessively) narrow, and much of the conversation takes itself too seriously (and certainly myself), but relative to the rest of the Net it's an exemplar. Good conversation remains exceedingly hard to find.

xorand · on April 3, 2019

> it turned out that posts drove other engagement, not members

Yes, this was something particular to G+. Incidentally see these in Linus interview [0], HN post [1]

" The whole "liking" and "sharing" model is just garbage. There is no effort and no quality control. In fact, it's all geared to the reverse of quality control"

"I'm not on any social media (I tried G+ for a while, because the people on it weren't the mindless usual stuff, but it obviously never went anywhere)"

[0] https://www.linuxjournal.com/content/25-years-later-intervie...

[1] https://news.ycombinator.com/item?id=19559970

KirinDave · on April 1, 2019

What kind of things were important there? I left because after I came out enby it was non-stop harassment. It seems to me like a hellhole.

chessturk · on April 1, 2019

Hellholes are historically and anthropologically valuable.

A Diary of a Napoleonic Foot Soldier[1] is a hellhole.

https://www.penguinrandomhouse.com/books/185435/diary-of-a-n...

rasz · on April 1, 2019

Umm I dont know, maybe to start counteracting the reputation(https://killedbygoogle.com/) Google managed to earn over the years?

dredmorbius · on April 1, 2019

It's really hard to judge. Google are complex and opaque, G+ itself is pretty complex.

That said, the company's proved less than helpful to date in the process of the G+ shutdown.

And incompetence is difficult to distinguish from malevolence.

est31 · on April 1, 2019

More context: https://www.reddit.com/r/DataHoarder/comments/b69d1b/google_...

jonas21 · on April 1, 2019

Did they bother talking to anyone at Google before they started doing this? It sounds like they’re making a massive number of requests in a short period of time which is probably indistinguishable from abuse to Google’s automated systems.

ClassyJacket · on April 1, 2019

"Did they bother talking to anyone at Google before they started doing this?"

How would you go about contacting Google? As far an I've ever heard they're notoriously impossible to contact by anyone who isn't paying large amounts to run ads.

baroffoos · on April 1, 2019

You don't even have to pay much for ads to get help. They will do a call and walk you through the steps for setting up everything to start spending. Its just that advertisers are the only customers google has.

gscott · on April 1, 2019

Only in the past year or two. Before that you could only contact them via a support form and someone in India would answer the support.

tedivm · on April 1, 2019

People who do pay large amounts to run ads get to talk to someone, but that someone normally has absolutely no power whatsoever.

KirinDave · on April 1, 2019

It is in fact a service Google sells to cloud customers, same as AWS.

tjpnz · on April 1, 2019

It's easier to get a job interview at Google.

zaarn · on April 1, 2019

New strategy: Get tech support from google by applying to them and then complaining during the interview.

craftinator · on April 1, 2019

Okay, I'll figure out how to diagram a binary tree that has elements of my complaint on each leaf. Then during the whiteboard, I can reconstruct the complaint and just stare into the interviewers cold, robotic eyes...

ocdtrekkie · on April 1, 2019

Given how long Archive Team has been working on this (two weeks), my guess is the automated systems did not catch it, but that someone there decided to put a stop to it.

est31 · on April 1, 2019

The Archive Team made a public call for help two days ago, which possibly has increased the number of ips that download content. This might have triggered automated defenses that weren't triggered before, or were manually disabled before.

kg · on April 1, 2019

Try ever talking to Google about anything and you'll understand the problem with this suggestion.

I mean, it's a reasonable suggestion, but it doesn't work

jijji · on April 1, 2019

The one time I had to call google because when I was working at a large company down the street and they had blocked all of our netblocks and we couldn't use their systems for a few days. Some young teenager-sounding male answered the phone and sounded annoyed that I called and said that we should write a message on the forums and then hung up on me.

user5994461 · on April 1, 2019

Looks like the support improved dramatically.

They now have a phone number and actively pick up the phone to imply you'll never get any support.

notacoward · on April 1, 2019

What's really stupid about this is that it would have been less effort and expense for everyone involved if Google had arranged to work with Archive Team directly instead of forcing them to scrape content via HTTP over the public internet. That includes Google themselves. What's happening now is probably that they're ramping down the number of servers handling Google+ content, which they could have done even sooner if they'd cooperated on a proper archiving strategy.

I guess sometimes Google does this kind of crap just to prove (mostly to themselves) that they can. Truly, the bully of the internet.

identity-haver · on April 1, 2019

There was a claim [1] that the G+ terms of service might legally prohibit them from doing this after the service is shut down. I haven't verified it.

However, it's clear that for an archiving effort this big, people at Google are explicitly allowing it. The user agents and fetch patterns of the Archive Team crawler were clearly distinct enough to get caught by an automated tool, and someone knew someone at Google in order to get it unblocked.

Unfortunately, any archival effort that requires the "Warrior" crawler (and not just a guy with a 4TB disk) is at the mercy of the website's remaining staff and management. Just ask Soundcloud. Archive Team started to archive their stuff when it looked like they were going to go under, but Soundcloud shut them down.

[1] https://news.ycombinator.com/item?id=19410050

notacoward · on April 1, 2019

That's a really good point. OTOH, I think it would be nearly impossible for anyone to make a claim that their privacy has been violated by archiving public posts. In that case, rights have been granted to everyone (i.e. the rights Archive Team is currently exercising without issue) so limitations on rights granted to Google itself are irrelevant. OTOOH, IANAL. ;)

mirimir · on April 1, 2019

In my experience, scraping Google data is hard. I did it years ago for a project. And I had to lease a huge block of private proxies. Each one only lasted a few minutes. But with a large enough block, they'd come back.

dredmorbius · on April 1, 2019

Google Web Search especially -- I've found that more than a query every 5-10 minutes will start throwing CAPTCHAs. For what I was doing, there was no other way to get the information I was looking for, so I just resigned myself to very slow crawls.

For Google+ itself, over a period of years, I'd hammered it with 100s to ~100k requests from residential IP space without ever throwing rate-limiting, at a rate of 2-20 queries/second or so, roughly.

We've started getting news of rate limiting over the past few months as archival activity has proceeded.

tanilama · on April 1, 2019

I mean enforcing traffic control restriction is everyday operations for big companies, why would they assume that Google will treat them differently?

29ssyg · on April 1, 2019

I'm getting lots of 500 errors on Google Plus these hours, so maybe they're not getting throttled, just Google servers crapping out.

throwawaygoog10 · on April 1, 2019

When we turned down the G+ API we injected synthetic errors in increasing fractions of requests a few days ahead of the actual turndown to alert users. I'm not sure if its the same case here, but it's possible that's what's happening.

nexuist · on April 1, 2019

Isn't G+ due to be shutdown in two days?

KirinDave · on April 1, 2019

Yes and it seems that's what Archive Team has been trying to archive.

dredmorbius · on April 1, 2019

Consumer G+, yes, which is what the archive is about.

The service itself will continue as a paid product "G Suite Google+".

Context comment: https://news.ycombinator.com/item?id=19539373

anticensor · on April 2, 2019

G Suite Google+ only has a few thousands of users compared to common Google+, which has had more than 200 million.

dredmorbius · on April 2, 2019

Source?

The core G+ userbase was vastly smaller.

(I've done, and am continuing to do, estimation of that.)

There were 3.4 billion profiles, about 110-120 million ever posted anything publicly, but it was a much smaller core of 50-100k who were posting 50-100 times monthly (roughly 1-3x daily). And I figure a large share of that was span churn (the 50-100k was basically a month's worth of activity, and the attrition rate of active profiles was high).

We're collecting metrics of active G+ users through the shutdown, and that's, various:

4650 or so registered to the Google+ Mass Migration community on G+ (main migration community), since October 2018. (I'm a mod/owner).

12,000 at Pluspora (since Sept 2018)

38,000 signed the Change.org petition.

MeWe and a few other sites claim upwards of 10,000 G+ registrations, though they don't give sources or retention rates. Given that you have to sign in just to see content that's ... probably inflated.

Exodus communities in various places tend to be in the tens to 100s of members, from what I've seen (and again, I've been compiling sets).

This all skews heavily English-language speaking (though not necessarily native), and G+ was huge among Spanish, Arabic, Indonesian, Japanese, and Chinese speakers based on some indicators (mostly Community titles -- I've archive 8.1 million of those).

Upshot: if G Suite hits 10k or so members, that's an appreciable fraction of the core community G+ base, and might be more strongly motivated to use and like the product. Though of course, it's not public Internet.

(I'm frankly not a fan, but I do have a good sense of the numbers.)

toomuchtodo · on April 1, 2019

> Google servers crapping out.

This seems....highly unlikely for Google.

kg · on April 1, 2019

If they don't view G+ as an important product and it's about to be shut down, it wouldn't surprise me if they set an aggressively low quota on its compute and storage resources. Google internally bills teams and product groups for services they use, AFAIK.

kabwj · on April 1, 2019

Throughout the years I’ve seen most if not all Google services have glitches or go down completely.

KirinDave · on April 1, 2019

Google has all sorts of awesome things, but fault proof hardware is not achievable in the real world.

ummonk · on April 1, 2019

You've never opened Youtube, have you?

soup10 · on April 1, 2019

You don't get it guys, when Google scrapes the web and downloads everyones data then serves up parts of it with sponsored ads next to it in searches its OK because they are Google. But if you scrape their data it's not OK because you're not Google. Once you understand this it makes perfect sense.

SpicyLemonZest · on April 1, 2019

In case this is meant to be a serious comment, there's a standard mechanism called the robots.txt file to tell crawlers you don't want them to scrape your website. You don't have to let them if you don't want to.

HeWhoLurksLate · on April 1, 2019

Not to be argumentative here, I'm seriously asking- is there anything keeping them from doing so anyways, and just not publishing it?

dTal · on April 1, 2019

Yes, everyone who ran a server on the internet would know, and make a big stink about it.

HeWhoLurksLate · on April 2, 2019

Well, that makes sense.

wsh · on April 1, 2019

Except archive.org doesn’t obey robots.txt files any more [1], and they also ignore requests to remove content.

[1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

0815test · on April 1, 2019

They don't obey robots.txt files posted after-the-fact by domain hoarders that have zilch to do with the original content. This is entirely proper on their part.

superkuh · on April 1, 2019

Archive.org is not the archive team.

trevyn · on April 1, 2019

I doubt they ignore DMCA requests.

OrgNet · on April 1, 2019

but google often violates copyright by showing so much of your data on the search engine's page that users might not even need to visit your page to get what they need... I'm surprised nobody is suing Google for that yet (or maybe they did and I missed it)

Operyl · on April 1, 2019

The problem with your line of thinking is that even that can be manipulated with meta tags and what have you (oembed, etc) and that’s what Google would argue in court.

OrgNet · on April 2, 2019

what can be manipulated with meta tags?

soup10 · on April 1, 2019

how does that make it any less hypocritical for Google and others that vacuum up everyones data for free and monetize it to ban or api-restrict others

KirinDave · on April 1, 2019

It's not even clear that Google is specifically doing that at this point, but ultimately a service like G+ is quite expensive to run. It's a bit weird to suggest it's public property because Google has an unrelated product it does make available without direct monetary cost to most of the world.

soup10 · on April 1, 2019

Google and other tech companies shouldn't be banning services that do productive things with their data, especially not when that data was cheaply collected or volunteered to them. bandwidth is not expensive. Not saying that's what happened here, though it very likely could be.

KirinDave · on April 1, 2019

I don't understand why you think a problem that was only there briefly and went away in less than an hour is proof positive of a positive policy action.

xorand · on April 1, 2019

How can I archive my photos from G+ posts i.e. this https://get.google.com/albumarchive/110322266958783287132/al...

rasz · on April 1, 2019

In theory you can use Google Takeout. In reality Takeout has NEVER been able to backup my YT comments, and there is no way or reporting a bug/appealing/speaking to anyone about it.

xorand · on April 2, 2019

I made available all the sources for all those animations, but this will not bring back 150K views/day. It's a shallow metric of interest, I know, however I am human too and I appreciated that, seen that presently there is, technically, no scientific publication avenue for this. https://doi.org/10.6084/m9.figshare.4747390.v1

xorand · on April 1, 2019

archive.is can do only this, i.e. a tiny tiny drop https://archive.is/jHg3h

kazinator · on April 1, 2019

Banning downloads of what, exactly?

kristofferR · on April 1, 2019

Google+ before Google deletes it in a few days.

mcv · on April 1, 2019

A few days? I think it's tomorrow.

dredmorbius · on April 1, 2019

https://news.ycombinator.com/item?id=19539373

mrhappyunhappy · on April 1, 2019

What do they mean by this? I did notice that looking for archives of a few domains, including my own returned no results.

dredmorbius · on April 1, 2019

Google+ is shutting down on April 2.

There's a bunch of folks who are interested in preserving content from the site. Some personally/individually, a whole host of communities, and, well, because their mission is to preserve the world's data, the Internet Archive and an unaffiliated though closely-working group, the Archive Team: https://archiveteam.org

That's a bunch of volunteers who basically suck the guts out of failing websites, and they've got some big ambitions.

I'm from the Google+ user community side and have been helping organise information and activity on behalf of others. There's a pretty comprehensive (and messy) wiki at https://social.antefriguserat.de with FAQs and directories and guidance and a whole bunch of other stuff, plus other resources -- G+ communities, subreddits, and more.

We (the G+ community side) stumbled across Archive Team this past January and were delighted to discover they existed. (They've told us they're also glad they exist.) And we've been working together to coordinate this pull, with AT providing the technical capabilities -- code, servers, bandwidth, and storage -- and us mostly pestering them with questions but also knowing a good bit about the characteristics of Google+ data and some of its organisation. (I've made a hobby of ... measuring much of that.) I was finally able to answer my mother's persistent question "but who cares about any of this" with "arkiver, of the Archive Team".

Anyway: the Archive Team grab is in full swing, we've got about 24-48 hours left, depending on when Google shut shit down, and all of a sudden all the archiving agents ("Warriors") stop collecting data. We're 86% through the working set, and are on track to complete 92% of all scheduled data requests by the earliest anticipated shutdown window. (Earlier estimates, before the throttling, were over 100% completion, meaning we had some slack space.)

Answer your question?

mrhappyunhappy · on April 1, 2019

Yes, thank you