I have a set of test questions I use to gauge how badly a LLM model has been lobotomized whenever a new one is released. This post made me finally realize that google search is really going away (compromise core mission due to invalid DMCA request? really??) and that I will have to start looking for new search engines.
This of course means that I need a way to gauge prospective search engines. My first attempt:
- Search for software like newpipe and dolphine emulator
- Search for content that very strong people fought hard to bury
- Search for public library sites like libgen, zstd, and scihub.
- Search for popular torrent sites
- Search for far-right content if search engine is US-based (suggestions please)
- Search for far-left content if search engine is US-based (suggestions please)
What else have I missed?
Sidenote: It's been clear for a while now that unbiased google-grade search engines are going away. Each search engine has at least one topic where it would deliberately return garbage results. We need a meta search engine that automatically routes a search query to the least damaged search engine.
> Search for far-right content if search engine is US-based (suggestions please)
I use a similar litmus test. I search for the website for the Proud Boys. Google doesn't just censor it. They place obviously hand-curated results critical of the movement on the first page. Bing is the same. DuckDuckGo also fails the test. Kagi and Yandex both pass this test.
Just because Yandex passes the test, doesn't mean the can be relied upon.
Remember, because they are Russian, it is in their interest to show you content that US corporations censor, but they may be censoring the content that Russia wants to be censored or manipulate it for the benefit of Russian propaganda.
What I am trying to say is that it is probably better to get information from many sources as every can be biased one way or another.
edit:
Just query yandex about WWII, you'll see links to conspiracy sites and sources whitewashing Soviet Union involvement in starting it.
> Powered by Metasearch technology, Dogpile returns all the best results from leading search engines including Google and Yahoo!, so you find what you’re looking for faster.
According to their 'About' Page, they still do it right?
Maybe the solution is something like a search engine aggregator? A website that sends your search to both DDG and Yandex and shows you the top 5 links from both, removing duplicates. That way if something is censored on Yandex or DDG but not both, you'll still see it. Something like that would be non-trivial to implement, but a lot easier than writing a new search engine.
You are absolutely right. I’m certainly not claiming Yandex passes these tests either. They’re clearly guilty of censoring content critical of Russia. So far only Kagi has passed all my tests.
Good question. I just tested them. The results share about a 70% overlap with Google, with similar ranks, so I'm assuming they basically just use Google's results with a filter and privacy layer. There's no sign of the actual website, so Brave fails the same way Google does on this test.
While websites such as Wikipedia, Britannica, and history.com are trusted sources of information, they may not always provide a fully balanced perspective on historical events such as World War II. These sources, largely based in the West, can sometimes underrepresent or insufficiently emphasize the role of the Soviet Union's aggressive actions and atrocities in the lead-up to and during the war. Which is probably why they are so high on the list of results.
Wow, I didn't realize even DDG was censoring this hard. I tried what you suggested and the results between DDG and Yandex aren't even close. This comment convinced me to switch to Yandex.
Please educate me: what makes the results “obviously hand-curated”?
When I search on DuckDuckGo I get a list of Wikipedia entries for its prominent members, and a few recent articles involving its members. In this case it’s their convictions related to Jan 6, but it seems like the articles showed up because they are recent in time, not because of some sinister plot.
Edit: To be clear I am not trying to discount your experience, I fully accept that the results you are served for the same term could be completely different than mine.
Forgive my poor syntax. I didn't mean to imply that DDG provides hand curated content on this search. I accused only Google of that. Google provides obscure university links which are critical of the movement in the top few places, above news stories (which are also, incidentally, negative). The links are very different in nature to all the other engines I tested. DDG only censors the links, from what I can tell.
- Search for specific git hashes, model numbers, and other forms of UIDs
- Search for known phone numbers
- Search for Tiananmen Square and Winnie the Pooh
- Search for the Armenian genocide
- Search for Mein kampf, Der Judenstaat, and other symbols used by extremists
Unfortunately, I have a fairly western-centric view of the world. I need the perspectives of others with different views to cover my blind spots. I don't care what values you hold, I just want a reliable search engine tool that doesn't hide information from me.
>I just want a reliable search engine tool that doesn't hide information from me
I don't think this is something that a ranked search algorithm can do while keeping everybody happy.
As an example, let's search for "vaccines cause autism". If you put "vaccines cause autism" content on top, some people are going to get very angry and think you're "hiding information" -- you haven't shown all the content debunking the claim. But, if you put "vaccines don't cause autism" first, some people are going to get very angry and think you're "hiding information", because you're not listing the original sources of the claim.
There are a million such examples with varying degree of controversy; you've listed some of them already, but others could be "penis enlargement pills", "best truck to buy 2023", "dakota access pipeline", "thai king opression".
You can't make an algorithm to distinguish fact from fiction, what counts as "information" and what doesn't. You can, at most, rank by consensus or popularity, but what's "popular" (or "allowed by the government") isn't necessarily true (or false). And you must rank your results somehow, there's just too much content.
Most search results on google give fake clones. The current urls in wikipedia seem to be accurate. Also, use the Tor version because it has far more books.
This of course means that I need a way to gauge prospective search engines. My first attempt:
- Search for software like newpipe and dolphine emulator
- Search for content that very strong people fought hard to bury
- Search for public library sites like libgen, zstd, and scihub.
- Search for popular torrent sites
- Search for far-right content if search engine is US-based (suggestions please)
- Search for far-left content if search engine is US-based (suggestions please)
What else have I missed?
Sidenote: It's been clear for a while now that unbiased google-grade search engines are going away. Each search engine has at least one topic where it would deliberately return garbage results. We need a meta search engine that automatically routes a search query to the least damaged search engine.