Programmable Search Engine

karterk · on Oct 6, 2020

I've implemented Google Custom Search Engine on a few sites before, and it's not clear if this is the same (repackaged?) or different. The biggest question (as with many Google products) is when this it going to be shut down, given that Google Search Appliance itself reached its end of life in 2019.

If you want a dead simple way to integrate search onto your site/app but find the learning curve of something like Elasticsearch steep, take a look at this project that I've been working on:

https://github.com/typesense/typesense

Typesense is an attempt to offer a great search experience out of the box -- with minimal configuration. It also supports clustering/HA, so you can certainly go a long way before considering something like ES (which is a great product but also pretty beefy).

redslazer · on Oct 6, 2020

I have been playing around with Typesense and MeileSearch. Typesense was quicker to get started with but MeileSearch sorting made more sense.

karterk · on Oct 6, 2020

Happy to hear any feedback on what I can do to make it easier. Typesense does not require you to define a full index-time sorting order (except for a default sorting field) so it's pretty flexible on what fields you can sort on at query time.

franze · on Oct 5, 2020

Oh, the old Google Custom Search Engine is still around, rebranded. Think it was called also Google Co-Op once.

I used it for some hacks i.e.: a people search engine once. And some SEO indexing checks before Google Search Console existed. Became pretty useless as the index was then seperated from the real Google Index.

Looking at it there is a way to upload XML files. Mhmmmmm... prop. good starting point for the next Google Bug Bounty Hunt.

paxys · on Oct 5, 2020

I remember when they used to sell a physical search box that you could plug into your server.

codazoda · on Oct 5, 2020

I had the pleasure of working with two of these. They did not give you the same quality of results that Google's website did. Utter garbage in a blue rack mounted case.

jfim · on Oct 5, 2020

One thing I've heard from multiple googlers is that their internal wiki search is crap exactly because there's limited hyperlinking to give a good ranking for pages. Google's algorithm basically crunches a lot of human provided data (inbound links, clicks, dwell, etc.) into an algorithmic score, but with limited data it's not much better than tf-idf.

That's probably the same thing that you've encountered with the search appliance.

tyrust · on Oct 6, 2020

I think you might have to change your "is" to a "was". wiki isn't really used anymore and the internal search is really really good.

deanCommie · on Oct 6, 2020

You guys don't use a wiki anymore? What replaced it?

Seems some form of wiki always has value inside a company?

cameronbrown · on Oct 6, 2020

There's a ton of different forms of documentation within Google, but mainly Google Docs or markdown.

no_wizard · on Oct 6, 2020

What is tf-idf, if you don’t mind me asking

8note · on Oct 6, 2020

It's a search scoring algorithm, combining how frequently the search term is used in a document with how frequently it appears in other documents.

A word that shows up in the document and no others means it gets a very high score, vs "and" shows up in every document and thus doesn't score well for any

nullsense · on Oct 6, 2020

Term Frequency/Inverse Document Frequency.

It's what Apache Lucene does which is what ElasticSearch uses under the hood.

dwhit · on Oct 6, 2020

https://lmgtfy.app/?q=tf+idf

throwaways885 · on Oct 6, 2020

Pretty much. Internal search at Google actually used these appliances for a long time. Fortunately it's still been evolved and was externalised as a Google Cloud product a few years back.

mtmail · on Oct 5, 2020

https://en.wikipedia.org/wiki/Google_Search_Appliance 2002-2019

omgwtfbyobbq · on Oct 6, 2020

The loss of merch was the writing on the wall. ;)

The appliance used to come with a T-shirt, but no longer includes one.

gogopuppygogo · on Oct 6, 2020

I remember when Apple sold me a developer license that came with a T shirt each year...

dhosek · on Oct 6, 2020

I was on a team that set one of those up for the Fox Filmed Entertainment intranet. It was a mess to set up, but it did look cool.

fergie · on Oct 6, 2020

I used to sell those. They were actually quite good for their time, especially when they first came out, but they were really expensive.

Its weird that Google has been so half-hearted about «enterprise search» since they stopped selling GSAs. You would think that they would be itching to know what lies behind your firewall.

caro_douglos · on Oct 6, 2020

Remember hosted urchin?

vram22 · on Oct 5, 2020

That was probably the Google Search Appliance.

franze · on Oct 5, 2020

I.e. my old "Google Digg” Search Engine. Called ”Gigg”

https://cse.google.com/cse?cx=001532657764357279270:7uvopk6d...

seg_lol · on Oct 5, 2020

TiL that Digg still exists, https://digg.com/video/nickelbot-turns-nickelback-lyrics-int...

wbarber · on Oct 6, 2020

Googled for confirmation that the Google CSE index is different from "the real Google Index" and haven't been able to confirm that.

I've tinkered with Google CSE a lot and it does have its uses. The knowledge map options are fun to tinker with even if a bit of a black box.

As for the xml - when you download the xml that's intended to be your starting point for an engine, it doesn't include parts of the definition of the CSE like sites you've listed to exclude. So that functionality seems to be a little broken, but I've confirmed you can add and remove elements of the knowledge graph at least via the xml file.

ipsum2 · on Oct 5, 2020

It looks like its still CSE, based off of the URL and the outdated design language before material (<2014? https://programmablesearchengine.google.com/cse/create/new)

shaneapen · on Oct 6, 2020

Exactly! Is there any new interesting features at the least? Nothing stands out to me except the stats.

bluewavescrash · on Oct 6, 2020

LOL I thought I was going crazy.

softwaredoug · on Oct 5, 2020

From a relevance PoV, I think the docs here[1] tell you everything you need to know

In short it's a rules and keyword-tagging heavy experience, not at all comparable to Elasticsearch or Lucene (which let you really get down to the metal to customize core algorithm behavior).

BTW if you want this kind of experience in an open source stack, I would check out Querqy for Elasticsearch and Solr[2]

[1] https://developers.google.com/custom-search/docs/ranking

[2] http://querqy.org

pknopf · on Oct 6, 2020

I look Google's product, because it (looks?) like it would work with static sites, like GitHub pages.

softwaredoug · on Oct 6, 2020

Totally sounds great for that use case

m-i-l · on Oct 6, 2020

As others have noted, this is Google's Custom Search Engine rebranded, which will show adverts in the free version.

If you want to add search to a small static site, e.g. a small personal website, then https://lunrjs.com is good.

Or you could take a look at my side project which includes a search as a service with a simple API: https://searchmysite.net/

ignoramous · on Oct 6, 2020

LunrJS looks great, but if one needs an instant text-based search, this is what I've got on my FAQ page:

        function search(querybox, searchables, searchstatus) {
            const t = querybox.value.toLowerCase()

            let hits = [];
            let misses = [];
            for (s of searchables) {
                // empty "t" matches everything
                if (s.innerText.toLowerCase().indexOf(t) >= 0) {
                    hits.push(s)
                } else {
                    misses.push(s)
                }
            }
            if (hits.length > 0) {
                for (g of misses) g.style.display = "none";
                for (k of hits) k.style.display = ""; // default display style
                searchstatus.innerText = hits.length + " results found."
            } else {
                searchstatus.innerText = "No results found."
                for (g of misses) g.style.display = "" // show all
            }
        }

        function setupSearch() {
            const searchables = document.getElementsByClassName("searchme")
            const querybox = document.getElementById("searchbox") // <input>
            const searchstatus = document.getElementById("searchstatus")
            if (!searchables || !querybox || !searchstatus) {
                console.log("search setup skipped")
                return
            }
            querybox.addEventListener("keyup", e => search(querybox, searchables, searchstatus), false)
        }

        window.addEventListener('load', function(event) {
            setupSearch();
        }, false)

m-i-l · on Oct 9, 2020

Looks nice, but as you say that just searches the current page. Here's how I search my entire site with slightly fewer lines of JavaScript:

  // Construct the API query
  const apiEndpoint = 'https://searchmysite.net/api/v1/search/michael-lewis.com';
  let urlParams = new URLSearchParams(window.location.search);
  let queryParam = urlParams.get('q');
  if (queryParam == null || queryParam == '') { queryParam = '*' }
  let apiQuery = apiEndpoint.concat('?q=', queryParam);

  // Build the results (using fetch rather than XMLHttpRequest)
  fetch(apiQuery)
    .then((resp) => resp.json()) 
    .then(function(data) {
      let searchResults = data.results;
      document.getElementById('query').value = queryParam; // Set the value of the search box to the query
      if (searchResults && searchResults.length > 0) {
        return searchResults.map(function(result) {
          // Each result is going to be displayed as <li><a href="${result.url}" class="title">${result.title}</a></li>
          let li = document.createElement('li'), a = document.createElement('a');
          a.appendChild(document.createTextNode(`${result.title}`));
          a.href = `${result.url}`;
          a.classList.add('title');
          li.appendChild(a);
          // Each result is added to the <ul id="results"></ul>
          document.getElementById('results').appendChild(li);
        })
      }
      else {
        // If there are no results update the <h1 class="title" id="results-title">Results</h1>
        document.getElementById('results-title').innerText = 'No results';
      }
    })
    .catch(function(error) {
      console.log(error);
    });

tenkabuto · on Oct 5, 2020

I wonder if there's a limit on how many URLs it can search and if I can import a list of URLs rather than inputting them one by one.

EDIT: It appears that you can import URLs by pasting in a list once you've made a simple search engine. But supposedly there is a limit of 5000 URLs: https://support.google.com/programmable-search/thread/200713...

rc_mob · on Oct 6, 2020

5000 is not very many at all

tenkabuto · on Oct 6, 2020

Yeah. I tried looking for some alternative service, but Bing's custom search is paid and has even lower limits.

isnt · on Oct 6, 2020

It's 5000 URL patterns e.g. www.example.com/* .pdf or example.com/foo*

https://support.google.com/programmable-search/answer/451388...

transitivebs · on Oct 6, 2020

Hmmm, I just tried setting up a new site search for an existing site and can't get any search results to show up. I'm guessing it takes awhile for the search index to populate, but that isn't really messaged anywhere in the product UX.

The last solution I used was https://www.meilisearch.com/ -- it's open source and way cheaper than Algolia.

neurobashing · on Oct 6, 2020

I set this up a while ago but never turned it on. I have a very simple web site - it's basically my mixology recipe book - and I started adding https://lunrjs.com to it this weekend. I ended up yak-shaving for the bulk of my time, but initial experiments were promising; anyone used it?

Fileformat · on Oct 6, 2020

I highly recommend lunr.js. I've used it on side projects both client side [1] and server side [2] and it works great. My only Yak-shaving was tweaking the separator regex.

[1] https://github.com/VectorLogoZone/vectorlogozone/blob/gh-pag...

[2] https://github.com/VectorLogoZone/logosearch/blob/main/src/s...

freakynit · on Oct 6, 2020

Same old Google trick to gain more control over others' content. Naah... not interested

reilly3000 · on Oct 6, 2020

How does this compare to Algolia? I was shocked at the irony when I discovered that Firebase docs recommended using Algolia for full-text search queries over Firebase data.

sarthakjshetty · on Oct 6, 2020

I think that has to do with the fact that both Algolia and Firebase went through YC. IIRC, PG stated that the first 100 or so customers of YC companies are in fact YC Companies. I think Firebase decided to give Algolia a try early on and decided to stick with them since.

reilly3000 · on Oct 6, 2020

The thing that’s kind of wild is that Firestore, which is their new reimplementation of the original Firebase db (now Realtime Database), is what is recommending Algolia:

https://firebase.google.com/docs/firestore/solutions/search

Given a clean slate I would have thought they might push Cloud Search or this service, even without any special integration. Candidly I haven’t taken the time to understand either of those offerings but I have to assume you could could index JSON documents just as easily as with Algolia. Not that there’s anything wrong with that; it was just surprising to see them promote a third party in general and a search engine specifically. After having used 90+ Google products over the last 17 years, transacted many millions of with them, and read innumerable pages of docs, I’ve not really seen anything quite like that— where a quite common use case of their product actively promotes a competitor and links to them.

Buetol · on Oct 6, 2020

Took 2sec to setup and works very well !

EDIT: It's really Custom Search Engine rebranded to me, it doesn't index all you website, I still prefer Algolia if I had to have a external search provider.

EDIT 2: Seems very customizable, maybe by adding all your urls it will index all you site, making it a viable alternative to Algolia

jpadkins · on Oct 6, 2020

wow. what a surprise to see this on hn today.

I am a product manager that works on programmable search engine. AMA*(that I can actually answer)

If you are interested in what has been changing recently, https://programmablesearchengine.googleblog.com/

Edit: and yes, this is a new name for what many knew as Google Custom Search Engine. Also, here is our community site where a lot of Q&A happens: https://support.google.com/programmable-search/community

Nazzareno · on Oct 6, 2020

They claim "pay a low price for an ad-free experience", but what is the price?

Nazzareno · on Oct 7, 2020

I found the answer: https://support.google.com/programmable-search/answer/906910...

norswap · on Oct 6, 2020

Was curious, and found a "custom search engine" (previous name of this product) for my tumblr blogs (which are still online) that I made back in the days.

I tried it ... and it's terrible for my use case. Because these blogs are mostly not hyperlinked anywhere (one is a personal quote collection, another is a collection of hacks and tech tips I found useful), Google doesn't seem to even have them indexed. Nevermind that I manually submitted them for indexation back in the day.

Not that I expected much better from Google at this point.

skytreader · on Oct 6, 2020

What a horrible product name. I've always lamented the fact that big brands need very little creativity in product naming. Though Google is sometimes but often lazy, I can at least ascertain some reason behind their product names:

- The really boring ones like Google Docs, Google Maps, Google Talk, even GMail and Inbox, etc. were all potential generic B2B products so there's an incentive to make it as simple (read: boring) as possible. Even your Executive Emeritus couldn't second-guess what Google Docs does.

- Nexus, Pixel, and Stadia were all competing for a demographic that needed to catch attention; as names these aren't boring at all. "Google Phone"/"Google Smartphone" will be dead even before it launches against iPhone.

- Kubernetes kinda falls between the two. It's a B2B product but it introduces a completely new concept so it needs its own name. Plus they made it a standard so you can't slap the big G before it.

Now, "Programmable Search Engine" just sounds like a product name concocted by naive CS-undergrad seniors[1]. Did no one suggest "Google Search Appliance Plus" (no more confusing than "YouTube Red")?

[1] My CS undergrad senior batch would've named this "Programmable Search Engine System". God forbid anyone think what we're doing is not related to software engineering.

Ocelot20 · on Oct 6, 2020

In case you're curious...

No customization applied, pointed at Hacker News: https://cse.google.com/cse?cx=37e5b152f44307b8a&q=%22search%...

Hacker News' own search: https://hn.algolia.com/?q=%22search+engine%22

sa46 · on Oct 6, 2020

Interesting,the Google results for "search engine" are heavily weighted by recency compared to Algolia.

Google looks like it's much better at non-title searches:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

https://cse.google.com/cse?cx=37e5b152f44307b8a&q=%22high-ca...

fsflover · on Oct 6, 2020

See also: https://yacy.net, open source p2p search engine, which can be used locally too.

schwinn140 · on Oct 5, 2020

I wonder if they're getting interested in this again after seeing Algolia and Elasticsearch take off? I suppose this was only a matter of time.

xtiansimon · on Oct 6, 2020

Off-Topic question for Search-heads. Anyone know of an in-browser solution to make web search start with your bookmarks (rank previous bookmarked items higher)?

[Solution to search browser-URL-history first before the rest of the WWW][1]

[1]: https://softwarerecs.stackexchange.com/questions/46270/solut...

Havoc · on Oct 6, 2020

Well that was underwhelming

dimator · on Oct 5, 2020

Now everyone that's ever said "I would pay for an ad free experience" can get what they want.... and come up with another reason to be unsatisfied.

t0astbread · on Oct 5, 2020

I propose data collection!