I want a curated search engine so badly. I feel like some enterprising developer could craft one specifically geared towards developers. Imagine a search that doesn't ignore operators like '?.' and where you can set persistent conditions on every single query without having to type it out. This would help prevent you from needing to type in the language you are searching for in every query.
Oh, and the cherry on top is completely abandoning the idea of Natural Language Processing. Go right back to keywords only.
I've considered taking a stab at it, but I didn't for likely the same reason no one else has: no idea how to monetize it. Sure you can run ads, but your conversion rates will likely stink because you'd be selecting for non-suckers.
Not trying to spam but will mention devonagent once more in this thread, they just sell it, old fashioned perpetual license. respects boolean operators.
You might be able to monetize through closer partnerships with companies like Microsoft, Atlassian, Amazon, etc.. I'm envisioning an ad system that is basically geared towards selling their cloud and DevOps services. Because the engine is geared towards programming queries, you might also entice these companies by showing that developers have an easier time troubleshooting their systems. Perhaps, eventually, the engine could be bought out and act like a loss-leader or funnel to services. I know Microsoft is trying to be very developer friendly.
Just some random thoughts. Monetization is a huge bootstrapping challenge for something like this.
I’d love to take a stab at it. I think there are ways to monetize it that don’t involve advertising (at least enough to cover salaries and infrastructure). The only reason I haven’t is I don’t have access to the initial capital.
It can be solved without centralized server: every content site, page, or paper, will publish Bloom filter for their content discoverable via sitemap, to which you subscribe via RSS/Atom. When you need to search, you will make hashes for words in your query, then will check each Bloom filter for potential matches. When potential match is found, full page source can be downloaded and checked more carefully for ranking.
Perhaps technical users are more likely to be running an ad blocker and/or are overall less likely to actually click on ads. If you're going to have 1/10th the CPM by making this search engine vs. one geared towards everyone, why would you spend the extra time and CPU cycles powering it? Maybe a subscription model would work but you'd have to show just how better it is since you're competing against good-enough free search engines.
Content curation doesn't scale, so I believe it's out of question for the modern web. The best you can hope for is to have someone throw some quality AI at it, plus a lot of manual tweaking.
Given that the goal is only a small subsection of topics, might scalability not be as critical to the success of the project? If the engine is specifically crawling official programming language documentation, help forums like StackOverflow, and select sites/pages (think university professor with resources about how to implement xyz), might that be enough to not necessitate scaling even farther?
Oh, and the cherry on top is completely abandoning the idea of Natural Language Processing. Go right back to keywords only.