Google now shows why it ranked a specific search result

83 · on July 23, 2021

I would pay good money for a search engine where each result had a button next to it that would add that domain to my personal blacklist for future searches. (goodbye pinterest)

chrisshroba · on July 23, 2021

If you can deal with the aesthetics of every search query containing your blacklist, try creating a custom search engine in your browser with a keyword like "g" (for google) that searches for your query plus "-site:pinterest.com -site:example.com"

Beached · on July 23, 2021

one of the reasons I borderline hate google now is that the search results do not honor double quoted text, or subtract text like you mentioned. I have tried searched with -site: and google just ignores it and gives me results form there anyways

chrisshroba · on July 23, 2021

It's always worked for me. Would you happen to have an example query that doesn't respect those things?

prophesi · on July 24, 2021

Not OP, but Google Search is a blackbox so it's possible the same query would properly block the designated URLs for you, while remaining unblocked for other users.

EastSmith · on July 23, 2021

How many domains will you be able to remove this way? There is probably some low limit like 50?

bagomips · on July 23, 2021

Not sure if Google does anything to restrict plug-ins from tampering with its results, but domains in DDG results are canonicalized making it trivial to use them as keys in localStorage. So filtering after the fact, while slightly awkward, can work for a list of thousands or millions?

ceh123 · on July 23, 2021

I'm actually working on this [0]

Right now I just have my own custom filters and since I'm in grad school this is taking a much lower priority than I would have liked it to, but I have a goal of making user accounts with customizable filters by the end of the summer.

[0] https://hadal.io

justnotworthit · on July 23, 2021

For firefox, i use https://addons.mozilla.org/en-US/firefox/addon/personal-bloc... . It puts a "block this domain" link under the result (gotta block each pinterest.* though).

But if you blocked each bad result, I'm not sure there'd be any good left.

For example, I search "a bougainvillea without pruning" and all the results are "how to prune bougainvillea". Try: "How do I search the web without e commerce results?" and all you'll get is "how to do ecommerce better" pages.

marojejian · on July 23, 2021

uBlock Origin has a filter that blocks Pinterest quite effectively:

google.##.g:has(a[href=".pinterest."])

  google.*##a[href*=".pinterest."]:nth-ancestor(1)

charlesdaniels · on July 23, 2021

I have found uBlacklist[0] works very well for this purpose.

0 - https://addons.mozilla.org/en-US/firefox/addon/ublacklist/

ravenstine · on July 23, 2021

There used to be a Chrome extension that did exactly that but Google removed it. (They were the authors)

wolrah · on July 23, 2021

> There used to be a Chrome extension that did exactly that but Google removed it. (They were the authors)

Not sure about something Google themselves made, but I've been using this one called "Unpinterested" for a few years and it works as intended.

https://chrome.google.com/webstore/detail/unpinterested/gefa...

Fuck Pinterest for how they've ruined google image searches.

Abishek_Muthian · on July 24, 2021

Google also changes the front page results if you put the same query multiple times thereby bringing even phishing sites with low age as top result.

I recently caught a phishing site of my Govt. Tax website like that [1].

[1] https://twitter.com/Abishek_Muthian/status/14097614168657469...

didip · on July 23, 2021

If Google decides to make this available again and put it behind a paid subscription, I would still pay.

Fuck SEO spammer.

RileyJames · on July 23, 2021

I made Pinterest list for uBlacklist which does exactly this: https://github.com/rjaus/awesome-ublacklist

pknerd · on July 23, 2021

You can do something like: Keyword - Quora

blacktriangle · on July 23, 2021

goodbye w3schools

agucova · on July 24, 2021

I'm glad I was not the only one thinking of this.

zwkrt · on July 23, 2021

To paraphrase Ummon in the Hyperion book series, “science tells us that giraffes evolved long necks to eat leaves at the top of trees. What it doesn’t tell us is why no other animals evolved long necks”

Whenever we try to simplify AI decision into an “explanation” we throw out 99.9% of the information. In fact, all we did was make the AI more complicated by requiring that it spit out a “reason” alongside its result.

spywaregorilla · on July 23, 2021

This is not accurate for evolution or AI. For evolution, not all animals evolved long necks because evolution is a random search on a large, constantly changing space. Giraffes are also not the only animals with long necks. We have no reason to believe no other animals will evolve long necks in the future either.

For AI, stating that we throw out 99.9% of the information is wrong or misguided depending on interpretation. A lot of people are fixated on ML 101 belief that models are black boxes and impossible to unravel. This really isn't true, and significant progress on explainability tooling has been made in the past several years. Most of which (by which I mean none that I'm aware of) require changing the underlying AI model.

2 examples: 1) The well known 20 questions genie, akinator. Not really ML, but works similarly to decision trees. You have a space of information. The underlying tree model learns what the most relevant questions are to divide the space to optimize entropy for information gain. 100% explainable. Surprisingly accurate for many people. https://en.akinator.com/

2) Shap values for AI models are one example of tooling to make ai models more interpretable. They show you which features mattered for a single prediction, and by how much in which direction. If we imagine a decision tree again, we can assume that this bears a strong correlation to the path followed through the tree and the associated information gain at each step. If we imagine a complex image recognition neural net, we can highlight which pixels in an image contributed to the outcome the most. It wouldn't be terribly surprising if shap is the tool being used by google here specifically. Focusing on a few important features rather than considering every single is ideal.

If I, for example, give you an extensive fecal analysis, psychological, infared, radiological, psychic medium prediction, and survey results from 5 normal people who got to look at the animal- and I asked you to identify if it was a cow or not... It would be appropriate to just say "survey said cow, so its a cow".

dalbasal · on July 23, 2021

Wonderful analogy. I'm keeping it.

That said, this is not totally unlike many human explanations. In fact, increasingly, if a decision can be explained entirely as a product of decision making policies, it's likely to have been automated already.

Eg. If I ask you why you chose to give this analogy, you will probably be able to provide a plausible explanation. But, that explanation is probably constructed after the fact. What we decide to say or write "comes to us" and we don't really have access to how that works.

This is also true of judge's rulings, for example. They refer to rules (laws, precedents and legal concepts), and rulings must be justifiable in these terms... but the giraffe metaphor still plays.

I suppose this is equivocation in a sense, but I'm not arguing that human and AI decision making are equivalents whether or not they share this core feature.

TeMPOraL · on July 23, 2021

This looks very similar to the argument that there's no problem with black-box DNN-based algorithms making important decisions about people, because humans are also black boxes and their explanations are just after-the-fact rationalizations.

To which I counter: yes, humans are black boxes too. But they're all similar black boxes. We all share the same brain architecture, the same firmware. We have specialized hardware in our brains to simulate each other, predict other people's reactions. This is what allows us to work in groups and form societies.

Additionally, as societies, we've learned to force predictability on each other. In the past, high-variance individuals - ones that are too unpredictable - were shunned or outright killed. Today, we isolate most severe cases from society entirely, by putting them in prisons or mental hospitals, and isolate less severe cases from high-impact responsibilities - e.g. by gating pilot licenses, driver licenses and gun permits behind various kinds of physical and mental health evaluations.

For the day-to-day needs, humans are very predictable indeed, and their explanations are good enough. We know how to compensate for unreliable introspection. That's not the case with modern machine learning - the way DNN models "think" is completely alien to us. We can't work with this. We also don't have to, those algorithms are our creations. We can make them explainable. And we need to.

dalbasal · on July 23, 2021

As I said, while I'm technically equivocating (you too), I am not really saying that there are no reasons to be skeptical of black box algorithms. We agree on this, I think.

That said, I'm not sure that we can make algorithms explainable. We may get better at interrogating them, and the analogy to human "black boxes" may get spookier, but I suspect that they're unexplainable in a "hard" sense. An explainable algorithm would have made different decisions, and probably won't be as good at making them.

hypertele-Xii · on July 23, 2021

Science says evolution occurs from random mutations. Because they're random, there's no guarantee that every place with tops of trees evolve animals with long necks to eat them, just as flipping a coin no matter how many times gives no guarantee it ever comes out heads.

0898 · on July 23, 2021

Apologies if this is a very stupid question but is random mutation a fact or a theory? It’s something I’ve never quite been able to wrap my head around. If you corrupt a line of code you don’t get a different program you just get something that won’t compile. I suppose nature works very differently to programming - but it’s still something I struggle with.

apinstein · on July 23, 2021

The outcome of a mutation on the biology isn’t binary.

Some mutations are so bad that they are fatal to the organism (don’t compile). Where a gene isn’t that important, the mutation will just confer a fitness change.

this is mediated through a variety of mechanisms. Some mutations just alter the performance of a protein, or the expression dynamics (when, for how long, and how much) of the gene expressed as protein. Many proteins don’t act directly on things but are part of complexes of proteins that perform a function, or are receptors that trigger other things, and thus many genes are “meta” in this way.

Imagine a complex sequence if gene expression during vertebrate development. At one point during elongation, a series of enzymes alter how this process starts and stops. Small variations in the performance and availability of the genes involved in this will nudge the outcome of elongation to result in a longer (or shorter) necked animal. If that is helpful for that individual, the gene will survive and flourish.

This gets more complex as in animals w sexual selection we have multiple versions (alleles) of each gene. There is also a huge realm of epigenetics and modulation (methylation, promoters, antisense, etc).

It’s really stunningly beautiful and amazing.

titzer · on July 23, 2021

Also, many mutations are completely benign and don't influence anything right now. They confer genetic diversity to a population. Later, if a selective pressure appears via a rapid change in the environment, such as the introduction of a disease, a new predator, or climate change, what was a benign mutation might suddenly confer a survival advantage, and will begin being selected for.

This is why genetic diversity is important. Populations aren't just more copies of a single genome. They are in fact reservoirs of genetic diversity which makes ecosystems more resilient to change. This partly explains why there are so many species of insects. As the recyclers at the bottom of the foodchain, they are the boots on the ground that do the heavy lifting and are subjected to the first shocks. They evolve into many species not only to specialize, but to guard against rapid changes at the bottom of a food web.

codedokode · on July 23, 2021

Cannot evolution theory be validated by calculations?

We know the size of genome of an animal, we can estimate the maximum possible number of mutations throughout history. And then calculate if it is possible to create such genome with this number of mutations or not.

For example, if a mutation happens once per K animals, if one successful mutation happens once per L mutations, and there have lived M animals then maximum number of successful mutations is M / K / L. We can compare this number to size of a genome and verify whether it could be created as a result of an evolution.

Smaug123 · on July 23, 2021

This question is something of a category error - there are no "facts", only "theories which have a greater or lesser amount of evidence in favour of them". Some theories have such an overwhelming amount of evidence in favour of them that we call them "facts".

But the presence of random mutation is extremely well documented, yes - how else do you explain why identical twin humans don't have identical genomes? A great majority of corruptions have no effect, because biological systems are highly resilient to change; the analogy you want is less "delete a line of code" and more "delete a single machine from AWS". Some corruptions are so bad that they cause the organism to fail completely at the embryo stage. Some corruptions have an effect which is detrimental in one environment but beneficial in another; the whims of nature decide whether such an organism survives. Some corruptions are simply beneficial, and if they're sufficiently beneficial then they take over the gene pool in not many generations.

shadowgovt · on July 23, 2021

There's a pretty good book of essays, "Group Theory In the Bedroom," which touches on the quest to unravel the DNA code in the time between the discovery of DNA chemically and the invention of the high resolution microscopy technology that allowed for direct observation of DNA strands.

Multiple mathematical approaches were taken based on myriad theories of information encoding, compression, and redundancy. For a brief period of history, the question of how DNA worked wasn't one of biology, but one of statistics and information theory.

Unfortunately, once they had the technology to observe DNA directly, they discovered that almost all of the theories were just wrong, because the encoding from DNA trigraphs to amino acids is basically nonsense. It's not structured, or organized, or elegant. Like so much of biology, it's a hot mess of accident and coincidence, and massively, inefficiently redundant... But, that redundancy has an interesting property. The most commonly used amino acids in protein formation are encoded by a large number of DNA triplets, and in general, one edit to those triplets results in a triplet that codes for the same amino acid.

In hindsight, it is perhaps not surprising that the encoding schema from DNA to amino acids is edit-resistant.

(What cooks my noodle regarding the redundancy is: do we see the redundant DNA encoding because those amino acids are most commonly used, or are those amino acids most commonly used because a large number of DNA triplets code for them so they are statistically likely to be more prevalent? ;) ).

specialist · on July 23, 2021

Just bookmarked "Group Theory in the Bedroom", thanks.

I just listened to interview with Rama Ranganathan about "evolutionary physics".

https://news.uchicago.edu/story/big-brains-podcast-explores-...

I couldn't really follow their discussion. From your post I'm guessing you'd get a lot more out of it.

Your questions about redundant encodings seems related to Ranganathan's assertion that biology can be both efficient and adaptive. Whereas with human design, efficiency and resilience is a more of a tradeoff.

Any who...

wirthjason · on July 23, 2021

Love the title! Can never go wrong adding “in the bedroom” to liven up a title.

There’s a math book that also has a catchy title: Mathematics and Sex, by Clio Cresswell

nicoburns · on July 23, 2021

> If you corrupt a line of code you don’t get a different program you just get something that won’t compile.

You might be interested to know how common (naturally) aborted pregnancies and eggs that don't hatch are.

epivosism · on July 23, 2021

Something I learned recently is that evolution is less like a program than I thought; to take an example, if during development you move a proto-eye in a frog to the end of their arm, the eye will still connect to the nervous system, get blood vessels, etc. connected to it. Why in the world does that work? Because morphology is a system that dynamically takes in information and adjusts to it; not a system like "grow a blood vessel exactly here for 2cm".

This is why mutations can often not cause immediate "compilation failure" since they just alter a parameter of the environment a specific morphological system is growing in, and the change is usually just noise which the system already has a large buffer of variability it can use to adjust.

mig39 · on July 23, 2021

The mutation is random, but the selection isn't random.

If a mutation offers an advantage, even a tiny one, over time and generations is will be selected more than the non-mutation.

wongarsu · on July 23, 2021

I can't find it right now, but there are some optimizers that optimize code by swapping, deleting, inserting or mutating lines of code, then discarding all mutations that don't compute the same result (which for integer arithmetic you can prove with a SAT solver). Maybe somebody has the link, I can't find it right now.

I think it's also important to realize that making new stuff, like creating a new type of protein, is very rare in nature. Most evolution is just number tweaking: make the neck or beak a bit longer, increase melanin levels, etc. And the way those "variables" are encodes seems to lend itself to natural variations, which is (one reason) why humans have so different height.

codedokode · on July 23, 2021

This is called superoptimization.

zbentley · on July 23, 2021

> If you corrupt a line of code you don’t get a different program you just get something that won’t compile.

Usually, but not always.

Add to that an astounding number of iterations (random point changes), the fact that changes are guaranteed to be ... well, the metaphor breaks down quickly, but it would be something like "syntactically valid in the language being used, and no larger or smaller than a certain size of diff", and a sort of error correcting/weighted (as opposed to absolute/binary) behavior as a result of each change, and it starts to seem a bit more workable.

But even that is barely scratching the surface of how this stuff works. Code is far from a perfect analogy.

nindalf · on July 23, 2021

I like apinstein’s explanation to your query. If you’d like a book length explanation, The Gene is excellent. Provides excellent explanation as well as historical context of how our understanding evolved over time.

spywaregorilla · on July 23, 2021

Probably better to imagine it as a strictly typed language and you're swapping in compatible components so it does compile. There's a vaguely understood second level here where it's not just first order changes mutating in and out, but second and third and fourth order change that enable an organism to evolve more easily in specific directions.

edit: although some genetic mutations do of course result in organisms that will certainly die, possibly before being born

leephillips · on July 23, 2021

You may be interested to look up genetic algorithms.

hliyan · on July 23, 2021

The correct analogy is more like this:

DNA in nucleus - source code on disk

Messenger RNA - source code in memory

Ribosome - JIT compiler

Protein - tiny running program, think unix utilities. Also, not saved to disk.

There is source code for millions of such programs in the DNA.

The compiler (Ribosome) recognizes any combination of the base constructs of the language (C, G, T, A) as valid, but the compiled program may not work.

hypertele-Xii · on July 23, 2021

Computer programming has existed for some one hundred years.

Genetic encoding has existed for billions. A lot of time and randomness have driven it to become resilient to changes in a way we struggle to yet even comprehend.

prepend · on July 23, 2021

Lots of mutations. Most result in not compiling, but some result in long necks.

tobr · on July 23, 2021

I don’t know much about this subject, but logically that seems to only be a good explanation if the long giraffe neck happened in a single lucky coin flip, ie in one mutation in one individual. That goes against my intuition of how evolution works, through many small steps over many, many generations. Happy to be corrected by someone who knows more.

naasking · on July 23, 2021

It's an arm's race. A lucky coin flip evolved a longer neck, they survived better and bred more, but then trees evolved to be taller because of the proliferation of longer necks, then a luck coin flip evolved an even longer neck, etc.

Long necks aren't the only possible adaptations to get to taller trees though. For instance, tree climbing is another.

tobr · on July 23, 2021

So if it requires multiple lucky coin flips, shouldn’t we expect it to happen in many places, not just in giraffes? What explanatory power does “randomness” really have, then?

saalweachter · on July 23, 2021

It depends how common the niche is and how accessible the body-plan is in the genetic search-space.

We do see convergent evolution into the "tree" and "crab" body-plans. Dozens of unrelated species have evolved along similar lines, filling similar niches by evolving similar forms.

There were long-necked animals in the past -- a well-known type of dinosaur comes to mind -- but the niche isn't so large or successful as the "eat a bunch of grass" niche.

xbmcuser · on July 23, 2021

larger the number of coin flip needed the less likely it s to happen not more.

tobr · on July 23, 2021

I don’t think so? A larger number just demonstrates that lucky coin flips are common and abundant.

eurasiantiger · on July 23, 2021

Forests are another (competition for sunlight).

hliyan · on July 23, 2021

I think chromosome selection plays a much larger role in generation-to-generation variation than mutations do (which are rare).

eurasiantiger · on July 23, 2021

On the contrary, coin flips are probabilistically independent from each other, while evolutionary events certainly are not, however small the causal influence between them may be. Thus, while an infinite number of coin flips does not guarantee a single heads, an infinite number of evolutionary events must eventually result in any given adaptation. Lim P(p) = 1.0 as p -> inf.

kennyadam · on July 23, 2021

We can make an educated guess though, right? They didn't evolve them quick enough, so were out competed by the giraffes, for example. I know you were making a more general point, but I just wanted to throw that in.

gpm · on July 23, 2021

Moreover just glancing at Giraffe physiology shows us all the compromises that a long neck entails.

For instance it's long neck traps a lot of air in it's breathing cycle, requiring larger lungs for the same oxygen capacity. It has twice the blood pressure of a human to enable pumping up the long distance, and as such a giant heart. It has a complex circulatory system to control the change of blood pressure at it's brain when it bends over, and handle the high blood pressure in it's legs. And so on and so forth.

inglor_cz · on July 23, 2021

On the other hand, plenty of species developed fins and did not give up only because they were outcompeted by someone else.

But the ocean is a big place. Tree tops in the savannah may not be as important resource.

im3w1l · on July 23, 2021

Adding to that. There used to be ~150k giraffes (recently their numbers are declining) divided amongst 4 giraffe species. So while the strategy of using a long neck to eat the high leaves is viable, it's not crazy successful. It's not "everything should get in on this".

z3t4 · on July 23, 2021

Or giraffes find long necks (and other body parts) very attractive.

Smaug123 · on July 23, 2021

That is a glib answer which misses the point entirely.

Why do they find long necks attractive? The answer to that sort of question usually turns out to be "because animals who randomly happened to be a little more attracted to long necks were more likely to breed with the long-necked animals, so their genes became better represented in the gene pool thanks to the long-necked animals' better survival power".

gpm · on July 23, 2021

Why do peacocks (or really peahens) find giant feather displays attractive?

I think the answer is usually actually "because it is indicative of a healthy mate". If you need to be well fed and healthy to grow tall, or to maintain a giant set of ridiculous feathers, then it will probably be an attractive trait a few generations later, since it will predict success.

snovv_crash · on July 23, 2021

What about all the places where there aren't giraffes?

Akronymus · on July 23, 2021

Much faster conversion to savanahs, which resulted in not being able to adapt fast enough; no evolutionary pressures to evolve to be that tall; evolutionary pressures exists/ed but there were stronger pressures to not evolve to be tall, and similar reasons are what come to my mind.

high_byte · on July 23, 2021

but you aren't surprised eyes evolved separately multiple times? limbs come in even numbers (or at least symmetric if you count tails - but you don't see many 3-tailed animals)

you took the one example there is only one of.

besides it isn't necessarily AI decisions. these are statistical results, e.g. showing you which parameters have strong correlations. (some would argue this is all there is to AI)

EamonnMR · on July 23, 2021

We have even numbered legs because we all descended from a bilaterally symmetrical common ancestor and apparently bilateral symmetry is really good for building mobile animals of all descriptions. What's really wild is how Starfish manage to develop out of an initially symmetrical larva.

high_byte · on July 23, 2021

I just love how hacker news threads can randomly lead you to wildly different conversations topics :D I would've never learned about this stuff otherwise

but aren't starfish symmetrical either way? in the same way we have a head, but our head is also symmetric to itself. actually they have 5x symmetries :) would this be considered rotational invariance? idk

EamonnMR · on July 23, 2021

Radial symmetry is the term usually used for animals, yeah. It's just nuts that they switch body plans so radically.

zwkrt · on July 24, 2021

I guess I didn’t expect my comment to be so popular, and I was kind of tired when I wrote it, it was 2 in the morning. I guess my point was not anything really about complexity (or the evolution of giraffes as HN seems to be very excited to correct me about).

My point was that an AI giving an explanation for its output will almost certainly not explain how the model was generated or trained, because that is the secret sauce. It can only give an explanation from within its own domain of knowledge, unlike a human who can make an informed decision based on facts and then explain it from a higher-level viewpoint, or from the viewpoint of the person asking for the explanation.

For a concrete example, imagine an AI for determining if a given video is of someone shoplifting. I give it a video and it says “yes, suspicious hand movements (78%), avoiding staff (65%), change of clothing (45%).” Here the AI has said why it thinks shoplifting is occurring, and a savvy user could guess what types of videos the AI was trained on.

Buuuut, we are left wondering all sorts of important questions. How was the training data collected? Did it represent a diverse set of shoplifters and patrons? What movements are considered suspicious? What constitutes avoidance? What is the false positive rate of the AI? What was the motive of the people creating the AI…

So to go back to my initial analogy, The AI can explain its internal process of coming to a decision, but it cannot explain the human environment in which it was created. In the same way, we can use science to construct useful narratives for “why” specific features of an animal fit with its environment, but we fail miserably to explain “why” the entire environment came to be.

tetromino_ · on July 23, 2021

Longer necks for eating high leaves did evolve in other species! Think of all the sauropod dinosaurs, or more recently, of the giant herbivorous long-necked birds (elephant birds, moas) that humanity drove to extinction.

So the question is rather why so few mammals evolved long necks. Maybe there's something about mammalian physiology that makes long neck mutations usually unviable?

Gravityloss · on July 23, 2021

Specialization and competition explain this.

Taylor_OD · on July 23, 2021

Damn. I've read the Hyperion books a dozen or more times over the past 10 years. I couldnt have pulled this quote out but remember it now that you brought it up. Do you have a great memory or have you read them a lot?

codedokode · on July 23, 2021

You are assuming that search ranking is determined solely by an intelligent AI. This is not how search engines work. Search ranking is determined by people that decide that sites like A must be ranked higher than sites like B, update the rules and then maybe AI is somehow used to apply those rules made by humans.

So it shouldn't be difficult to explain why the site is ranked higher. But I don't think Google will ever disclose that secrets.

Ueland · on July 23, 2021

Looking forward to testing this on all the searches I do that gives me heaps of spam results. Seems that Google's focus on search spam has more or less disappeared.

What I've seen a alot of the last few years is sites with simply copied content republished on new domains. And I don't mean just some text, I mean the whole website.

Take this search for example: https://www.google.no/search?q=nrk+site%3A.it

It show's a bunch of Italian domains with mostly spam based on Norwegian content. This content regularly ranks high when searching for Norwegian cotent.

nisa · on July 23, 2021

gitmemory or these translated stackeroverflow clones are a pain - the original result doesn't even appear anywhere in the results - it's the same in german.

Overall google result quality declined a lot from my perspective (and colleagues also agree) in the past years.

the info-boxes and sidebars are often just plain wrong (wrong product, wrong answers) - have to add quotes to every other query but these also don't seem to work - websites that used to be indexed disappear and I get lot's of noise and outright spam websites with redirection spam (I've tested on Linux with different browsers, unlikely that it's hijacked).

Not sure what's going on - I guess they made a lot of progress in NLP with BERT and other models and nobody internally want to see the drawbacks?

lolinder · on July 23, 2021

Gitmemory et al are awful. They've made me think seriously about the plausibility of building a custom search engine.

From what I've heard, the biggest problem with building a general-purpose search engine seems to be the index—indexing the entire web requires data centers many times larger than most can afford, and deciding what is relevant in such an index requires excessive training data. But while working, I typically only want my search engines to search the same 200-odd sites (GitHub issues, Stack Overflow, MDN, language docs, high-quality technical blogs, etc.) It's too many to know which "site:" query to use, but it's a small enough set that it should be a manageable problem for an upstart search engine.

It also seems like this is likely true in many domains. Taking on Google directly by trying to build a general purpose search engine seems to be a non-starter (most competitors just crib off of an existing player, and don't do much better). But a bunch of sharded search engines for specific domains could theoretically outperform Google/DDG/Bing within their respective domains, because the human using them has already narrowed the domain down dramatically. They wouldn't dethrone Google (general search is always going to be needed) but they could compete for a segment of traffic.

Akronymus · on July 23, 2021

> it's the same in german.

reminds me of msdn, where I always search in english through some search engine, then get the german version presented with a SMALL button to turn it to english.

hypertele-Xii · on July 23, 2021

This is the real information apocalypse. Machines cloning knowledge ad infinitum with increasing amounts of real content drowning in an endless ocean of... ads; Economic propaganda.

Until content itself is cryptographically signed, there's no verifying where it come from or how true it is, if it ever were.

d110af5ccf · on July 23, 2021

> Until content itself is cryptographically signed

At which point you have to determine which signatories to trust. Which is the same issue we currently have - which publishers (ie websites) should the search engine trust (ie rank highly)?

So really what we need is a more robust method of attributing publisher quality. Except that perception of quality varies drastically between different consumers. Also, most solutions you might propose quickly start looking an awful lot like a distributed web of trust but no one has ever managed to pull that off at scale so far.

vodkapump · on July 23, 2021

This shows me almost exclusively pinterest results. A website I've never in my life used, but for some reason always get recommended to go to by google.

Sad to see NRK being abused like this tho

lgats · on July 23, 2021

So glad I'm not the only one experiencing the abundance of .it TLD spam.

I've seen this rotating around different country-level TLDs, .de & .bg spam domains were filling the results previously.

I suspect some big-name spammer rents domains for a short period while they're pending expiration.

lgats · on July 23, 2021

Several of the .it spam results are provisioned by 'Hosting Concepts B.V.'

A company perhaps also involved in some shady prescription med domains:

https://www.fda.gov/consumers/hosting-concepts-bv-dba-openpr...

https://www.fda.gov/consumers/health-fraud-scams/hosting-con...

blackcat201 · on July 23, 2021

Same in chinese search results for programming related question, most of the time I get was some bad translation from stackoverflow or github issue. Now I had to rely on browser plugins to automatically remove these domains from search results.

raxxorrax · on July 23, 2021

They certainly promotes established newspapers. If you search for a term that is in any capacity part of a prominent news cycle, you can forget the results.

corentin88 · on July 23, 2021

Google's Blogpost: https://blog.google/products/search/learn-more-and-get-more-...

Hacker News Post: https://news.ycombinator.com/item?id=27928454

hbcondo714 · on July 23, 2021

Thanks for posting the original source

speedgoose · on July 23, 2021

Could a search engine improve its results with upvote and downvote buttons? It would of course be a lot of work to prevent SEO people to destroy the technology again, but it may be a simple solution to remove the scams and spams.

ricardo81 · on July 23, 2021

Indeed any metric that can be influenced will end up being gamed.

Voting would likely require being logged in which is not so good for the more privacy orientated search engines.

Could flip the idea on its head and look at ways to verify the owner of a site/page (again, with privacy issues associated), and use that as a quality metric.

d110af5ccf · on July 23, 2021

> verify the owner of a site/page

To what end? A verified real identity could post crap. Meanwhile an anonymous actor could publish gems. It comes down to how much trust you place in a given publisher; that currently goes by domain. The only thing verified identity buys you is the ability to trust snippets from that person published across a variety of untrusted domains.

Meanwhile, analyzing per-account feedback patterns along with other session behavior significantly raises the bar for abuse.

ricardo81 · on July 23, 2021

Would generally agree about the anonymous example.

The general idea would be that people are harder to fake as a signal than links.

A lot of sites associate social profiles with their domain/page. Something like a topical author Pagerank would be interesting, with endorsements. That too, of course could be bought and gamed. Pagerank was a game changer when it arrived, essentially acting as a 'vote' for a page, where some votes are more important than others. Associating votes with people (like the OP thought) is interesting.

weird-eye-issue · on July 23, 2021

Very easy to game, actually it already is gamed in a couple different ways. One method - let's say you have a site about nutrition, you could find a doctor who wants to improve their search rankings (or find an agency that works with doctors to improve their rankings). Ask to put them on your website as the author or editor with a link back to their site. It's a win for both parties and Google can't really detect that since it isn't a two way link scheme.

AussieWog93 · on July 23, 2021

>The general idea would be that people are harder to fake as a signal than links.

I don't think it would be as hard as you think. There are already "captcha farms" in India and Bangladesh - "Identity farms" would be just as simple to set up.

pharmakom · on July 23, 2021

Google already has great spam control and bot detection for serving ads though. They dont care about privacy (except in so far as it hurts the brand).

z3t4 · on July 23, 2021

Detecting bots is very hard, and there are a lot of bots on Google Adsense. They could however add "captcha codes" for clicking on ads/links search results which would prevent bots, but users would likely find it annoying.

speedgoose · on July 23, 2021

Google knows already which link you click and can estimate how long you visit a website when you click another link or do a new search.

quantumofalpha · on July 23, 2021

Exactly. Click = upvote. Returning to SERP after a short dwell time = downvote. It's noisy but because you can mine such things implicitly by just logging user's behavior, the volume of data is way bigger than what an explicit feedback system can provide.

z3t4 · on July 23, 2021

Could use a different algoritms then on HN et. al. For example, show results based on how you and people that vote like you vote. So the ranking/order would be skewed for SEO spammers, but hopefully better for those that upvote based on the quality and information on the page.

ItsMonkk · on July 23, 2021

This has always seemed obvious to me.

You would have 4 options for each item, ( Upvoted, Downvoted, Not Voted, and Blocked). If a SEO manipulator is Upvoting a bunch of spam content that you didn't vote(that you've seen) for, down-voted, or blocked, they would quickly un-pair with you. The next step for SEO is for them to start voting identically and predicting to how you would vote so that a few paid ads didn't skew the results to much, but isn't that mission accomplished[0]?

This isn't done because it's grouping the users and not the content? It doesn't scale? And yet thematically this seems close to the model used by Google FLoC. If Google would just make a sub-reddit like system for each cohort so that we could submit and discuss content, I have a feeling that would be very successful and people would opt-in.

[0]: https://xkcd.com/810/

prophesi · on July 23, 2021

Could they also display the "personalized" aspect of results? If I do use !g on DuckDuckGo, my results for Java or Apache are vastly different than various friends/family.

bb123 · on July 23, 2021

I wonder if it will show when Google has manually edited the ranking of results to suit their agenda (US election, COVID?). Another interesting example of this is the image result for “European art”. The image results have been modified to the increase diversity of the skin colours of people portrayed. I don’t have a problem with it, but they definitely surface specific results to fit their political/ideological leanings. It would be nice if they were at least transparent about it.

pgcj_poster · on July 23, 2021

> Another interesting example of this is the image result for “European art”. The image results have been modified to the increase diversity of the skin colours of people portrayed.

One of the first results I get on Baidu is the cover of a book called European Art and the Wider World: 1450-1550 [1]. It shows what I assume are the three kings/magi presenting baby Jesus with gifts. Each of them has a different skin color and style of dress, demonstrating how European art portrayed people from the wider world. I don't think a Chinese search company has any particular motivation to try to influence people's perception of European racial makeup, so I would guess that these results occur naturally because the phrase "European Art" is mainly used to contrast with non-European art or people, as in the book cover. When people are writing about European art in a generic context, they'll usually be more specific and say "French art" or "Dutch art" or whatever rather than "European art," so that phrase is less likely to associated with images of European art that aren't showing such a contrast.

[1] https://m.360buyimg.com/mobilecms/s750x750_jfs/t8641/58/7295...

Cthulhu_ · on July 23, 2021

I think they would avoid doing that at all costs - their defense in court has always been "it's our algorithms", because if they had to admit "it was one of our employees" they would be liable.

There's ongoing antitrust lawsuits about them pushing their own services (e.g. Docs) in front of competitors, and some fuckery with Shopping. Their defense leans heavily on the statement that they didn't do it on purpose or manually, but that their algorithm considers their services the best. For that they need to lift the veil a bit on their algorithms (and iirc there is an outstanding court order for that one already).

brigandish · on July 24, 2021

There was the hugely, erm, coincidental timing of Winston Churchill's image going missing in Google's search results while there were calls from "progressives" to have his statue removed.[1]

Whether or not any of the specific instances are Google actually interfering or not the slew of evidence is all in one direction, hence I'm not sure why anyone doubts that these huge media companies are interfering in what we see, huge media companies have always done that, it's just that the newest ones have even more power to do so.

[1] https://reclaimthenet.org/google-removed-churchill-image/

berdario · on July 23, 2021

Is yours a genuine concern?

To me it seems a rehash of this years old controversy:

https://www.tweettabs.com/googling-european-people-history-t...

bb123 · on July 23, 2021

I did mention that I don’t have a problem with them altering the results as they see fit, I just think it would be ethical to mention that they have done so.

rscoots · on July 23, 2021

Underlying point I'd say is a concern, even if the specific example might not be applicable.

Thanks for the article on that!

nairboon · on July 23, 2021

interesting query when compared to yandex.com image results

semiconduction · on July 23, 2021

European art query: nice spot!

Agree with the edition of news results.

It becomes very hard to find information about specific topics. Most often you find articles with "theory X has ben deboonked!" at the top

ecommerceguy · on July 23, 2021

More fun ones to compare Google with Bing/DDG/Yandex/Baidu:

White Family image search (hilarious)

Hillary + anything autocomplete - Google shows nothing negative

various certain outlets article titles verbatim don't appear

Google bias has been on display for quite some time. And Yes Dang, I'd be honored to show evidence if you give me a forum.

JohnFen · on July 23, 2021

Interesting. That box does explain some of why I get consistently poor results from Google search.

sssilver · on July 23, 2021

What’s a lot more interesting is why certain matching results were not shown.

high_byte · on July 23, 2021

because the subset of lower ranked results is the set of all results minus the shown results, ie the entire index?

sssilver · on July 23, 2021

I meant results that would have been highly ranked but were censored out

thiht · on July 24, 2021

I read this as Google admitting they failed as an efficient search engine. They have to explain people why their results are good because the results themselves don't convey that anymore. It's over, Google is now useless. For the life of me, I just can't find anything on Google anymore without adding explicitly "reddit", "stackoverflow" or "github" in my query. I've abandoned the idea to find new user blogs using Google, it's over. It's all click farms and paywalls. And I don't care that their results match what I searched for, these results are useless if the spams are not filtered out.

menrris · on July 23, 2021

This feature might help a lot of businesses that struggle in the search results. This will help them get the SEO performance that they deserve.

hbcondo714 · on July 23, 2021

Yeah, it would be nice if this was integrated with Google Search Console so webmasters can this better.