You could, but I don't know what you gain out of it. The underlying index would be almost the same size, and n-gram would also allow you to search for e.t for example which you are losing in this process.
Code search is indeed hard. Stop words, stemming and such do rule out most off the shelf indexing solutions but you can usually turn them off. You can even get around the splitting issues of things like
a.toString()
With some pre-processing of the content. However were you really get into a world of pain is allowing someone to search for ring in the example. You can use partial term search, prefix, infix, or suffix but this massively bloats the index and is slow to run.
The next thing you try is trigrams, and suddenly you have to deal with false positive matches. So you add a positional portion to your index, and all of a sudden the underlying index is larger than the content you are indexing.
Its good fun though. For those curious about it I would also suggest reading posts by Michael Stapelberg https://michael.stapelberg.ch/posts/ who writes about Debian Code Search (which I believe he started) in addition to the other posts mentioned here. Shameless plug, I also write about this https://boyter.org/posts/how-i-built-my-own-index-for-search... where I go into some of the issues when building a custom index for searchcode.com
Oddly enough I think you can go a long way brute forcing the search if you don't do anything obviously wrong. For situations where you are only allowed to search a small portion of the content, say just your own (which looks applicable in this situation) that's what I would do. Adding an index is really only useful when you start searching at scale or you are getting semantic search out of it. For keywords which is what the article appears to be talking about, that's what I would be inclined to do.
The preprocessing that you need is (in Lucene nomenclature, but it's the same principle for search in general) an Analyzer (the component, which knows to prepare the plain text that gets inside for storing it in an index and the corresponding component for a search query) made for code search. That's not different from analyzers for other languages (Stemming sucks for almost everything but English). Thinking about it .. the frontend of most compilers for a language could maybe make a pretty good Analyzer. It already knows language specific components and can split them into parts it needs for further processing, which is basically what an analyzer does.
I actually half wrote a RFC of a spec and 2 implementations of a federated search last year. Rather than do the disturbed hash table that yacy does.
I wanted results to be re-rankable by the peers by sharing the scores that went into them. The idea being with a common protocol based on the ideas of ActivityPub you could get peers of searches working together to hopefully surface interesting things.
Something I should probably finish and publish at some point. It worked to the hundreds of peers I tested.
The reason I mention this is because I wanted to also add a front into yacy which tuned out to be harder than I expected. It’s a wonderful project and you can find great stuff through it but the way the peers return results sometimes it’s hard to find it again. It’s also not quite as hackable as I would have hoped at the time probably due to he project age.
I still think there is value in it though and I’d love to see yacy have its protocol explained as an apex so people could,build implementations in other languages more easily.
I remember the first days of gopher browsing were like that. Gopher browsing to me was like swinging on vine to vine. The trick was remembering/documenting where each vine went.
It's one I point people at all the time when they ask me why something isn't working as expected in any standard search tool, and something I reference from time to time to refresh my own knowledge.
Just had all my connections cancelled. So extra day in San Fran for me which is less than ideal, but probably better than being on the flight if something happens.
It was total bedlam at the airport when I got in this morning however. With almost no flights available to replace the grounded ones.
Another red eye special for me tonight but at least no connections.
I don't think this would be practical for a static site. You still need to maintain a list of followers of your account somewhere and that needs to be dynamic if you want it to work the way people expect it to where they follow you from other instances.
Assuming you kept the @ list of accounts through some other means, if you had your webfinger setup with your public key, you could after creating new content to push up sign the publish events and push them to those followers.