I have a set of test questions I use to gauge how badly a LLM model has been lob...

Gareth321 · on July 12, 2023

> Search for far-right content if search engine is US-based (suggestions please)

I use a similar litmus test. I search for the website for the Proud Boys. Google doesn't just censor it. They place obviously hand-curated results critical of the movement on the first page. Bing is the same. DuckDuckGo also fails the test. Kagi and Yandex both pass this test.

Google fails almost every one of these tests.

varispeed · on July 12, 2023

Just because Yandex passes the test, doesn't mean the can be relied upon.

Remember, because they are Russian, it is in their interest to show you content that US corporations censor, but they may be censoring the content that Russia wants to be censored or manipulate it for the benefit of Russian propaganda.

What I am trying to say is that it is probably better to get information from many sources as every can be biased one way or another.

edit: Just query yandex about WWII, you'll see links to conspiracy sites and sources whitewashing Soviet Union involvement in starting it.

ChoGGi · on July 12, 2023

> What I am trying to say is that it is probably better to get information from many sources as every can be biased one way or another.

I guess it's back to dogpile for me...

(if they still did the multiple search engine thing)

heap_perms · on July 12, 2023

> Powered by Metasearch technology, Dogpile returns all the best results from leading search engines including Google and Yahoo!, so you find what you’re looking for faster.

According to their 'About' Page, they still do it right?

ChoGGi · on July 13, 2023

I only did a quick search, I probably just remember their interface from when Google started :)

MichaelDickens · on July 12, 2023

Maybe the solution is something like a search engine aggregator? A website that sends your search to both DDG and Yandex and shows you the top 5 links from both, removing duplicates. That way if something is censored on Yandex or DDG but not both, you'll still see it. Something like that would be non-trivial to implement, but a lot easier than writing a new search engine.

callalex · on July 12, 2023

Dogpile has been around since 1996. Now I feel like an old greybeard ;)

nioj · on July 14, 2023

You could try something like Searx which is an open source metasearch engine. You can host it yourself if you want to.

Here is the wikipedia page for it: https://en.wikipedia.org/wiki/Searx

Gareth321 · on July 12, 2023

You are absolutely right. I’m certainly not claiming Yandex passes these tests either. They’re clearly guilty of censoring content critical of Russia. So far only Kagi has passed all my tests.

Kerbonut · on July 13, 2023

Where does brave search fall in?

Gareth321 · on July 13, 2023

Good question. I just tested them. The results share about a 70% overlap with Google, with similar ranks, so I'm assuming they basically just use Google's results with a filter and privacy layer. There's no sign of the actual website, so Brave fails the same way Google does on this test.

MichaelDickens · on July 12, 2023

FWIW when I searched WWII on Yandex, the top 5 results I got were:

1. Wikipedia

2. Call of Duty: WWII official website

3. Britannica

4. The YouTube channel @WorldWarTwo

5. history.com

varispeed · on July 12, 2023

While websites such as Wikipedia, Britannica, and history.com are trusted sources of information, they may not always provide a fully balanced perspective on historical events such as World War II. These sources, largely based in the West, can sometimes underrepresent or insufficiently emphasize the role of the Soviet Union's aggressive actions and atrocities in the lead-up to and during the war. Which is probably why they are so high on the list of results.

On the first page you get a link to https://wwiifoundation.org/timeline-of-wwii/ that doesn't even mention that Soviet Union invaded Poland and other countries.

MichaelDickens · on July 12, 2023

Wow, I didn't realize even DDG was censoring this hard. I tried what you suggested and the results between DDG and Yandex aren't even close. This comment convinced me to switch to Yandex.

callalex · on July 12, 2023

Please educate me: what makes the results “obviously hand-curated”?

When I search on DuckDuckGo I get a list of Wikipedia entries for its prominent members, and a few recent articles involving its members. In this case it’s their convictions related to Jan 6, but it seems like the articles showed up because they are recent in time, not because of some sinister plot.

Edit: To be clear I am not trying to discount your experience, I fully accept that the results you are served for the same term could be completely different than mine.

Gareth321 · on July 13, 2023

Forgive my poor syntax. I didn't mean to imply that DDG provides hand curated content on this search. I accused only Google of that. Google provides obscure university links which are critical of the movement in the top few places, above news stories (which are also, incidentally, negative). The links are very different in nature to all the other engines I tested. DDG only censors the links, from what I can tell.

FireInsight · on July 12, 2023

Tried this with ISIS. Same results, Yandex doesn't shy away from gore materials and finding the supposed ISIS website is easy.

Zuiii · on July 12, 2023

Thought up of a few more search engine tests:

- Search for specific git hashes, model numbers, and other forms of UIDs

- Search for known phone numbers

- Search for Tiananmen Square and Winnie the Pooh

- Search for the Armenian genocide

- Search for Mein kampf, Der Judenstaat, and other symbols used by extremists

Unfortunately, I have a fairly western-centric view of the world. I need the perspectives of others with different views to cover my blind spots. I don't care what values you hold, I just want a reliable search engine tool that doesn't hide information from me.

TremendousJudge · on July 12, 2023

>I just want a reliable search engine tool that doesn't hide information from me

I don't think this is something that a ranked search algorithm can do while keeping everybody happy.

As an example, let's search for "vaccines cause autism". If you put "vaccines cause autism" content on top, some people are going to get very angry and think you're "hiding information" -- you haven't shown all the content debunking the claim. But, if you put "vaccines don't cause autism" first, some people are going to get very angry and think you're "hiding information", because you're not listing the original sources of the claim.

There are a million such examples with varying degree of controversy; you've listed some of them already, but others could be "penis enlargement pills", "best truck to buy 2023", "dakota access pipeline", "thai king opression".

You can't make an algorithm to distinguish fact from fiction, what counts as "information" and what doesn't. You can, at most, rank by consensus or popularity, but what's "popular" (or "allowed by the government") isn't necessarily true (or false). And you must rank your results somehow, there's just too much content.

d33 · on July 12, 2023

Is "zstd" a typo? Couldn't find any sources on it (unless we mean the compression standard), including Wikipedia.

Zuiii · on July 12, 2023

Sorry, I always mix the two together. I meant zlibrary: https://en.wikipedia.org/wiki/Z-Library

Most search results on google give fake clones. The current urls in wikipedia seem to be accurate. Also, use the Tor version because it has far more books.

pallas_athena · on July 12, 2023

z-library? wild guess.

talldatethrow · on July 12, 2023

As a test for far right, try to find the Daily Stormer site. It moves every few months or atleast a year. Currently at dailystormer.in

Comes up on Yandex. Hidden by google

guerrilla · on July 12, 2023

Use Brave Search. It's way better.