I like the term "faceted search", which I first experienced working with Solr. F...

masklinn · on Aug 27, 2023

Afaik faceted search just means the system provides filtering through predefined categories (taxonomies). In TFA both “advanced search” and “filters” are faceted.

simonw · on Aug 27, 2023

This is the single biggest problem I've been having with the term "faceted search": I can't find a single, universally agreed upon definition of exactly what it means!

But I really need it to have one, because it's a key feature of the software I am building.

If there's a more widely accepted term for my version of it - filters with displayed counts - I'd love to know about it.

As it is, most people have never heard the term anyway.

bryanrasmussen · on Aug 27, 2023

Daniel Tunkelang wrote a book on Faceted Search that is pretty good, turns out he's also on Medium https://dtunkelang.medium.com/facets-of-faceted-search-38c3e... not sure if he's on HN.

dtunkelang · on Aug 28, 2023

Thanks for the kind words! I'm not usually on HN, but I just discovered this thread and am happy to contribute if I have anything useful to add. The book is a bit dated, but I have continued to post about search more broadly on Medium, particularly on the topic of query understanding.

joedevon · on Aug 27, 2023

I saw him give a great keynote and have been a fan of faceted search for years but did not know that. Learn something new every day

aidos · on Aug 27, 2023

I also think of faceted search of having the counts and narrowing the options as you drill down.

Ironically enough that was through Endeca which was introduced to my company (at great expense) as I was pushing for solr. Eventually solr became the tool of choice because it was more flexible and less cumbersome.

avremel · on Aug 28, 2023

Endeca was killed by Oracle.

I recently completed an Endeca to Algolia migration. I spent quite a bit of time auditing the Endeca implementation, from the XML files to the Windows desktop application. It was pretty good for its era.

dtunkelang · on Aug 28, 2023

While the acquisition was good for me financially, I agree that Oracle didn't really invest in sustaining Endeca as a product. I am proud of what we achieved at Endeca, but search has come a long way since then.

Coincidentally, I now consult for Algolia. :-)

avremel · on Aug 28, 2023

Cool, I thought I was the only one who has uttered Endeca and Algolia in the same sentence.

I wrote up some tips/impressions on configuring Algolia for ecomm:

https://www.avikaminetzky.dev/posts/algolia-ecommerce-nextjs

canadiantim · on Aug 27, 2023

I’d argue the beauty of faceted search is that it doesn’t require predefined categories, which (eg if using something like datasette) helps to explore the data even if it’s a new dataset

hermanradtke · on Aug 27, 2023

Last time I used solr the facets were defined in the schema so Lucene could create and index on that field.

psadri · on Aug 27, 2023

The facets need to be defined but their possible values are computed from the data and don’t need to be predefined. These are also recomputed based on existing filters.

dtunkelang · on Aug 28, 2023

I've found that, for mobile screens, the most valuable refinement real estate is often the top row of the results. Since that space is very limited, there's only room for the most useful facets or filters. And you have to decide whether to show keys or values, e.g., "Brand" as a key or "Nike", "Adidas", etc. as values. Showing keys takes up less space and allows you to cover more ground, but showing values may be more useful -- and certainly more discoverable -- to the user, since there's one less step. As with all things, it's a tradeoff, and I don't think there's been that much research on optimizing it.

goodoldneon · on Aug 27, 2023

How do you calculate the count for each filter without running a separate query for each filter?

mrkeen · on Aug 27, 2023

I worked in a search engine :D

In short, we needed to approximate the count.

We had some kind of hard limit for matching documents during a regular keyword search (80k maybe? It's been years since I worked there). The ranking was done after this, as well as the aggregation of data for the facets. So if your query of "cute bunnies" (faceted on file type) filled up all 80k results by the time the query processor made it through 25% of the data, and those 80k results contained 5k gifs, then we'd display '20k' next to the 'gifs' check box.

dtunkelang · on Aug 28, 2023

In general, the two ways to compute counts are top-down, by making a separate query for each filter, or bottom-up, by scanning the results and aggregating the counts, like a group-by. Top-down is good for a small universe of values, but bottom-up tends to be the scalable approach. And, as has been pointed out, you can produce approximations by aggregating a sample of the results -- as long as it is a representative random sample. Just be mindful of statistics, particularly confidence intervals.

A related issue is that counts tend to treat all results as equal. If you retrieva a lot of results but most of them are not relevant -- as can happen with full-text search -- then the counts can be misleading. You may have the converse problem if your retrieval excludes a lot of relevant results. So, if you are implementing a faceted search application where you use and show counts, you should keep in mind that it will only work if your retrieval does a reasonable job of balancing precision and recall.

Finally, remember that supply != demand. The distribution of a facet in your index may be different from the distribution of that facet in searcher intent. A bit more on that here: https://dtunkelang.medium.com/search-intent-not-inventory-28...

bryanrasmussen · on Aug 27, 2023

Indexing technologies tend to have support for facets https://en.wikipedia.org/wiki/Faceted_search

generally what you have is lightweight searches that you can query and get just the count of documents you would receive, in the case of facets you can generally get a list of facets or even a list of facets relating to a particular search and a count for each of these facets.

simonw · on Aug 27, 2023

Search indexes like Solr and Elasticsearch have the ability to calculate multiple facet counts efficiently in a single operation.

My application Datasette runs a separate SQL query for each one, which works fine if you are using SQLite and only have a few hundred thousand rows of data.

nelsondev · on Aug 27, 2023

Counts per filter can be stored pre-computed, using the same inverted index structure the search uses.

Keyword -> (num_docs, doc1,doc3,doc4,…)

So you can quite quickly look up the number of matches.

simonw · on Aug 27, 2023

That only works for the simplest case though - showing a count where the starting point was the entire corpus.

With faceted search you usually need to do things like "the user searched for 'x' and filtered for 'price less than $Y', now show counts for each of the different categories" - where pre-calculated counts won't help you.

Search indexes still help here, because they are really good at fast set intersections - so you take the set of document IDs matching your filters so far, then intersect them against the set of IDs that are listed for each of those categories and count the size of that intersection.

bakugo · on Aug 27, 2023

In elasticsearch, you can do this with filter aggregations.

adambsilver · on Aug 28, 2023

Yep, I have a whole thing on how to design filters for small screens. But in short, partially overlay the screen. There’s nothing else you can really do without having downsides. But like you say discoverability goes down but it basically has to.