Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a set of test questions I use to gauge how badly a LLM model has been lobotomized whenever a new one is released. This post made me finally realize that google search is really going away (compromise core mission due to invalid DMCA request? really??) and that I will have to start looking for new search engines.

This of course means that I need a way to gauge prospective search engines. My first attempt:

- Search for software like newpipe and dolphine emulator

- Search for content that very strong people fought hard to bury

- Search for public library sites like libgen, zstd, and scihub.

- Search for popular torrent sites

- Search for far-right content if search engine is US-based (suggestions please)

- Search for far-left content if search engine is US-based (suggestions please)

What else have I missed?

Sidenote: It's been clear for a while now that unbiased google-grade search engines are going away. Each search engine has at least one topic where it would deliberately return garbage results. We need a meta search engine that automatically routes a search query to the least damaged search engine.



> Search for far-right content if search engine is US-based (suggestions please)

I use a similar litmus test. I search for the website for the Proud Boys. Google doesn't just censor it. They place obviously hand-curated results critical of the movement on the first page. Bing is the same. DuckDuckGo also fails the test. Kagi and Yandex both pass this test.

Google fails almost every one of these tests.


Just because Yandex passes the test, doesn't mean the can be relied upon.

Remember, because they are Russian, it is in their interest to show you content that US corporations censor, but they may be censoring the content that Russia wants to be censored or manipulate it for the benefit of Russian propaganda.

What I am trying to say is that it is probably better to get information from many sources as every can be biased one way or another.

edit: Just query yandex about WWII, you'll see links to conspiracy sites and sources whitewashing Soviet Union involvement in starting it.


> What I am trying to say is that it is probably better to get information from many sources as every can be biased one way or another.

I guess it's back to dogpile for me...

(if they still did the multiple search engine thing)


> Powered by Metasearch technology, Dogpile returns all the best results from leading search engines including Google and Yahoo!, so you find what you’re looking for faster.

According to their 'About' Page, they still do it right?


I only did a quick search, I probably just remember their interface from when Google started :)


Maybe the solution is something like a search engine aggregator? A website that sends your search to both DDG and Yandex and shows you the top 5 links from both, removing duplicates. That way if something is censored on Yandex or DDG but not both, you'll still see it. Something like that would be non-trivial to implement, but a lot easier than writing a new search engine.


Dogpile has been around since 1996. Now I feel like an old greybeard ;)


You could try something like Searx which is an open source metasearch engine. You can host it yourself if you want to.

Here is the wikipedia page for it: https://en.wikipedia.org/wiki/Searx


You are absolutely right. I’m certainly not claiming Yandex passes these tests either. They’re clearly guilty of censoring content critical of Russia. So far only Kagi has passed all my tests.


Where does brave search fall in?


Good question. I just tested them. The results share about a 70% overlap with Google, with similar ranks, so I'm assuming they basically just use Google's results with a filter and privacy layer. There's no sign of the actual website, so Brave fails the same way Google does on this test.


FWIW when I searched WWII on Yandex, the top 5 results I got were:

1. Wikipedia

2. Call of Duty: WWII official website

3. Britannica

4. The YouTube channel @WorldWarTwo

5. history.com


While websites such as Wikipedia, Britannica, and history.com are trusted sources of information, they may not always provide a fully balanced perspective on historical events such as World War II. These sources, largely based in the West, can sometimes underrepresent or insufficiently emphasize the role of the Soviet Union's aggressive actions and atrocities in the lead-up to and during the war. Which is probably why they are so high on the list of results.

On the first page you get a link to https://wwiifoundation.org/timeline-of-wwii/ that doesn't even mention that Soviet Union invaded Poland and other countries.


Wow, I didn't realize even DDG was censoring this hard. I tried what you suggested and the results between DDG and Yandex aren't even close. This comment convinced me to switch to Yandex.


Please educate me: what makes the results “obviously hand-curated”?

When I search on DuckDuckGo I get a list of Wikipedia entries for its prominent members, and a few recent articles involving its members. In this case it’s their convictions related to Jan 6, but it seems like the articles showed up because they are recent in time, not because of some sinister plot.

Edit: To be clear I am not trying to discount your experience, I fully accept that the results you are served for the same term could be completely different than mine.


Forgive my poor syntax. I didn't mean to imply that DDG provides hand curated content on this search. I accused only Google of that. Google provides obscure university links which are critical of the movement in the top few places, above news stories (which are also, incidentally, negative). The links are very different in nature to all the other engines I tested. DDG only censors the links, from what I can tell.


Tried this with ISIS. Same results, Yandex doesn't shy away from gore materials and finding the supposed ISIS website is easy.


Thought up of a few more search engine tests:

- Search for specific git hashes, model numbers, and other forms of UIDs

- Search for known phone numbers

- Search for Tiananmen Square and Winnie the Pooh

- Search for the Armenian genocide

- Search for Mein kampf, Der Judenstaat, and other symbols used by extremists

Unfortunately, I have a fairly western-centric view of the world. I need the perspectives of others with different views to cover my blind spots. I don't care what values you hold, I just want a reliable search engine tool that doesn't hide information from me.


>I just want a reliable search engine tool that doesn't hide information from me

I don't think this is something that a ranked search algorithm can do while keeping everybody happy.

As an example, let's search for "vaccines cause autism". If you put "vaccines cause autism" content on top, some people are going to get very angry and think you're "hiding information" -- you haven't shown all the content debunking the claim. But, if you put "vaccines don't cause autism" first, some people are going to get very angry and think you're "hiding information", because you're not listing the original sources of the claim.

There are a million such examples with varying degree of controversy; you've listed some of them already, but others could be "penis enlargement pills", "best truck to buy 2023", "dakota access pipeline", "thai king opression".

You can't make an algorithm to distinguish fact from fiction, what counts as "information" and what doesn't. You can, at most, rank by consensus or popularity, but what's "popular" (or "allowed by the government") isn't necessarily true (or false). And you must rank your results somehow, there's just too much content.


Is "zstd" a typo? Couldn't find any sources on it (unless we mean the compression standard), including Wikipedia.


Sorry, I always mix the two together. I meant zlibrary: https://en.wikipedia.org/wiki/Z-Library

Most search results on google give fake clones. The current urls in wikipedia seem to be accurate. Also, use the Tor version because it has far more books.


z-library? wild guess.


As a test for far right, try to find the Daily Stormer site. It moves every few months or atleast a year. Currently at dailystormer.in

Comes up on Yandex. Hidden by google


Use Brave Search. It's way better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: