Nice to see search projects are still popping up. After a move, family life taking over and me getting more interested in Unreal Engine, my poor search engine is now more of an experiment in seeing how well it runs while basically on life-support maintenance updates I do. Starting to think I honestly should just take it down and save my $50 a month I spend maintaining it.
But I'll post it in a hacker news comment and maybe you all will give it enough traffic I can get excited about it again, lol
I just used postgres to build my search engine and it also helps with the last 2 questions. Keeping the content context consistent helps with the first. Unscatter.com for example is content shared only in the last 30 days. Helps with keeping my operating costs under $50 a month too.
I wish I had time to mess with it more. Job and life has taken over. My first goal with AI would be to use it to for key word and phrase extraction and also analyzing all the links I pull in hourly to see if there is a larger story I could make visible.
I've seen some people use () and other syntax on the stable diffusion subreddit. I've been trying to find a guide for the syntax but haven't had much luck with Google. Is there a resource for this?
It's been the same for unscatter.com for years but I've always attributed to that to me not having a real marketing strategy or even sticking with the ones I've tried to start.
I built https://www.unscatter.com using Reddit to source links for the search index. Last year I added Twitter as another source.
I don't think it's a "better" search engine. It's a different lens through which to search. Reddit and Twitter are an information source for what people are talking about. This is why I limit my index to articles that have popped in my Reddit/Twitter input in the last 30 days, deleting anything older.
I've actually had it up for years now, just don't know what to do with it. Been focused on my career in IT rather than entrepreneurism because well, life. I can say just this morning I saw "Stanytsia Luhanska" pop as a trending term on the front page of Unscatter and at the time mainstream media has not picked up the story of the school being hit by Russian shells.
I think over all the quality results still come from Reddit. Twitter often gets gamed and I see content terms pop up in the trending list. However, Twitter overnight (my time, US East) gets a more international flavor with lots of Korean and other Asia Pac country content bubbling to the top during that time because of Twitter.
Very cool site! Is there a way to get the words in a list instead of a cloud?
I am interested in what your thoughts are about being in an echo chamber and getting out of it, but this is a great way to get a high level view of whats happening on reddit.
Right now the terms are available only via the word cloud. I have considered trying to put together an api and also keeping more metrics about each link. For example how many times it's popped up, reddit and twitter users that posted it and which subreddits the post is in. I just haven't gotten around to it.
The concern of it being an echo chamber is one of the major reasons I added Twitter and I still look for more sources. For most of Trump's presidency some form of his name was the top trend 24 hours a day. Crypto is another trend, that while it's great for me because I'm interested in it as well, I question if it's really reflective of what the world is talking about.
I have considered creating some sub-sites as well to try and dig more. Focusing on subreddits for specific categories, but the Twitter api (at least what I can afford which means free) isn't quite as flexible for doing that kind of thing while staying inside my api call limits.
> I question if it's really reflective of what the world is talking about.
I'm very curious what each country is talking about. I made a little script that would show me what each country's subreddit was talking about but, the fact that it is on reddit means that its already biased. Maybe something where you use the the most popular source inside a country would be a way out of the echo chamber but the work in maintaining that I imagine might be a bit much.
This is pretty cool but am wondering, how are you choosing which subreddits/tweets are indexed?
I notice for instance "manga updates" is in the word cloud but a search for "kanye west" doesn't return any relevant results despite him being in the news a bunch lately for being kind of nuts.
I get the top 100 posts from the top 100 popular subreddits for the past hour (as defined by reddit). I then do some basic filtering on subreddits to exclude a few that are often in the top that I find are either mostly text or media content, I'm looking for links.
The lack of Kanye West content is interesting. I'm going to try to find some time this weekend to dive into that. The only thing I can think of is it's so well known less people are sharing links and more people are just making text posts on reddit about it.
However, it could be an indexing issue too. I'm using postgres for the index so maybe there is something there. I'll research that. Thanks for noticing and calling it out!
So if anyone is following this, right now it appears that not many Kanye stories are bubbling to the top. It's simply not in the top 100 of the top 100. For the full 30 day index, checking just on Kanye in story titles, this is all I have in my index.
Kanye West - Gold Digger
Kanye wants Billie Eilish to say sorry or he'll pull out of Coachella
Kanye West: ‘Stop Asking Me to Do NFTs... Ask Me Later’
Kanye West Does Not Want to Get Involved With NFTs
Kanye West Rejects NFTs, Tells Fans To Stop Asking: 'I Make Music In The Real World'
So long story short, I think I may need to consider increasing my pull. Going from the top 100 of the top 100 to maybe the top 1000 of the top 100 or the top 100 of the top 1000. I'll have to do some research and also validate my crawl can support it.
> Reddit and Twitter are an information source for what people are talking about.
It's a source of information for what Reddit and Twitter owners want you to see. Both websites are heavily manipulated in a myriad of ways. (The simplest one is via massive amounts of accounts banned for wrongthink.) This is blatantly obvious if instead of passively absorbing news you deep-dive into a specific issue and then look up the discussions and trends, especially on Reddit. Sometimes discussions for major events just aren't there, which is nigh impossible organically on a website with tens of millions of users.
Manipulated or not, I do agree it's an echo chamber. I added Twitter in attempt of breaking the echo chamber aspect but wasn't as successful as I hoped. I do occasionally look for other sources to add but finding similar sites that can match the volume of those two is difficult. Which makes it more difficult to figure out how to weigh other sites results to those two.
This discussion did make me poke around some more. I may consider using the free tier of Bing's api just to pull in trending topics.
What are some examples of this extreme and blatantly obvious manipulation? Certainly it happens but you're making it sound like we live in the Truman Show.
A subreddit with 480K users manages to upvote the topic 1.3K times with 78% up ratio. A subreddit with 1.7 million users upvotes a similarly themed topic (expressing a different opinion) only to get it 555 points with 91% up ratio. Ten times the difference in engagement.
This is the norm. Reddit bans the fuck out of people with opinions that don't align with the hivemind. You can search HN for "reddit banned" and see a limitless supply of stories. At one point they deleted two thousand subreddits in one go.
I'm not going into certain topics magically not appearing anywhere on Reddit, because it would require a significantly more detailed examples, but it's a thing as well. It is the Truman Show for public opinion.
My 14 year old daughter managed to find it by herself through her friends. I've only really ever used it to find nice wallpapers but my daughter is a pretty talented artist and she loves the site.
Postgres is really just great for being able to build just about anything to get that first viable product built. It's basically the swiss army knife for anything data in my opinion. You got sql, nosql, job queues, full text indexing. It's great.
I use it as a sql database and full text search for little personal project I work on off and on and it works great. I haven't touched it except to check every few weeks for security updates for months since I got a promotion and it, the golang app server and python scripts have had no issue just churning along keeping a 30 day archive of links found via reddit and twitter. Postgres is great.
But I'll post it in a hacker news comment and maybe you all will give it enough traffic I can get excited about it again, lol
https://www.unscatter.com