This has been stuck in my mind for a while. The past months, whenever I searched for something I would get covered in pages upon pages of SEO-friendly copywriter bullshit without real content; it's driving me mad.
How possible would it be, to build something like this?
I am also thinking about this problem constantly. SEO destroyed internet for me. I keep searching on HN using https://hn.algolia.com but obviously you can use it only for specific topics. I think there is a market in this problem
You could start a search engine with data from commoncrawl, maybe you can even get other projects like archive.is to ship you some hard drives. Then you just need to build an index and serve search queries; plenty of open source search engines have been attempted, giving good Templates or even directly usable implementations.
The hard thing is distinguishing personal blogs from blogspam and other worthless content. Performance is a huge issue since you want to spend at most double digit milliseconds per page, but maybe it's getting viable with ML becoming commoditized. But getting this perfect would make or break the project.
How possible would it be, to build something like this?