I've noticed that I have started doing that recently - appending reddit to my qu...

kwertyoowiyop · on Feb 15, 2022

Those are infuriating. I hate to see ACTUAL content creators having their livelihoods stolen this way. Why wouldn't Google filter out the worst offenders? It takes literally one minute to get a nice list of a dozen imitation sites that nobody would miss. Maybe Google feels a little inhibited from 'choosing the winners' for all but the largest cases?

rightbyte · on Feb 15, 2022

One FTE at Google could probably filter out like 99% of the SEO spam sites in technical english querries.

It would be a winning battle, since it is less work to blacklist than to make a high scoring site.

I guess Google Search internally is a mess. Maybe they have no clue what they are doing or have some really bad directors and lower managers messing stuff up.

Maybe there are so much blackbox ML called from 1000s of Perl files that the engineers don't understand what is happening.

Lascaille · on Feb 15, 2022

I often wonder how much modern IT infrastructure is simply this mess of 'we have no idea how it really works' blackboxes strung together with API calls.

I suspect you're right about how much of a true understanding they (at google) still have of the behaviour of their search engine.

tonyedgecombe · on Feb 15, 2022

>Why wouldn't Google filter out the worst offenders?

There are no Google adverts on GitHub, Stack Overflow, etc but there are on many of the copycat sites.

phpnode · on Feb 15, 2022

I'm not sure about these days, but historically the engineers on Google search wanted to fix these problems algorithmically, rather than delisting specific sites by hand

jll29 · on Feb 15, 2022

And, again historically, Amith Singhal and team preferred ranking algorithms to powerful-but-opaque L2R (learning to rank) approaches.

Quenhus · on Feb 15, 2022

Here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.

colordrops · on Feb 15, 2022

If you can do this, so can Google. This just shows they refuse to.

nickjj · on Feb 16, 2022

> If you can do this, so can Google. This just shows they refuse to.

If they immediately blocked these sites then Google would get a lot of flack for censoring the web.

I don't like these sites as much as anyone. A while back I even tweeted about[0] having a dream where I wrote a browser extension to intercept and redirect these copycat sites to the real site.

In my mind this falls into the same category as phone spam. The phone networks could block these but how would you feel if you knew your phone company was auto-filtering incoming calls without you having any control over that? It's a very thin line.

Hopefully one days algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism so they don't show up on the first page.

[0]: https://twitter.com/nickjanetakis/status/1473671136928018434

colordrops · on Feb 16, 2022

They already de-rank plenty of sites for countless abuses, especially for gaming search. They have been doing this for a long time, and no one has ever called it censorship. This is the first time I've heard of anyone even suggesting this.

Also, their ranking algorithm is extremely complex. To suggest one complex algorithm is censorship and another is unbiased search results is to have a very naive understanding of how search works.

Lascaille · on Feb 16, 2022

>algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism

So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?

nickjj · on Feb 16, 2022

> So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?

No, I thought more about my comment a day later. I don't know what a fair answer is. Being ranked on page 216 by an algorithm or de-listed manually is basically the same outcome.

mattarm · on Feb 15, 2022

I have found that installing uBlacklist (a browser extension) and blocking these sites from search results as I encounter them helps noticeably. There are only so many of these "clone" sites that rank highly on Google, so I found it pretty easy to keep up with them for the things I usually search for. There are even shared uBlacklist lists for things like SO clones, but I haven't even bothered to use them.

rightbyte · on Feb 15, 2022

Ye I have that one to and search hits gets notably better by just adding some 20 sites to it for tech querries.

I makes me wonder how Google can mess this up.

visarga · on Feb 15, 2022

It's not a bug, it's a feature. You search more times, see more ads.

colordrops · on Feb 15, 2022

There HAS to be a way for google to detect a site is a copy and de-rank it. I refuse to believe their army of PhDs can't figure this out. Google's incentives are wrong. They make more money from SEO spam with ads than from the original sites.

highstep · on Feb 15, 2022

appending "wiki" is also really useful if you're looking for straight facts

thiht · on Feb 15, 2022

It’s sad but I also noticed I have to add « wiki » more and more because Wikipedia is increasingly not the first result for searches where it should be the first result. Instead there’s often the stupid Google widget obviously copying Wikipedia’s content without a direct link to the actual page.

visarga · on Feb 15, 2022

I've seen Encyclopedia Britannica ranking above Wikipedia. It was really weird, I read both, Wikipedia was better.

SirZimzim · on Feb 15, 2022

We need AdBlock lists for search engines at this point.

xvello · on Feb 15, 2022

Indeed, that's why I built https://letsblock.it/filters/search-results