Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the term "faceted search", which I first experienced working with Solr.

Faceted search is effectively what this person is calling "filters", but often comes with an amazing bonus feature: each filter shows a count of how many results will be returned if you click it.

This turns them into a powerful way to summarize your data.

My https://datasette.io has faceted search as a core feature - demo here: https://global-power-plants.datasettes.com/global-power-plan...

One challenge I've found with both filters and filters-with-counts is the best way to design them for mobile screens, since they can take up a lot of valuable real estate.

It can be done though: e-commerce sites in particular often come up with neat UIs for tucking the filters away in an easily accessible tray - though that can hurt discoverability of the feature in the same way this author complains about the advanced search pattern.



Afaik faceted search just means the system provides filtering through predefined categories (taxonomies). In TFA both “advanced search” and “filters” are faceted.


This is the single biggest problem I've been having with the term "faceted search": I can't find a single, universally agreed upon definition of exactly what it means!

But I really need it to have one, because it's a key feature of the software I am building.

If there's a more widely accepted term for my version of it - filters with displayed counts - I'd love to know about it.

As it is, most people have never heard the term anyway.


Daniel Tunkelang wrote a book on Faceted Search that is pretty good, turns out he's also on Medium https://dtunkelang.medium.com/facets-of-faceted-search-38c3e... not sure if he's on HN.


Thanks for the kind words! I'm not usually on HN, but I just discovered this thread and am happy to contribute if I have anything useful to add. The book is a bit dated, but I have continued to post about search more broadly on Medium, particularly on the topic of query understanding.


I saw him give a great keynote and have been a fan of faceted search for years but did not know that. Learn something new every day


I also think of faceted search of having the counts and narrowing the options as you drill down.

Ironically enough that was through Endeca which was introduced to my company (at great expense) as I was pushing for solr. Eventually solr became the tool of choice because it was more flexible and less cumbersome.


Endeca was killed by Oracle.

I recently completed an Endeca to Algolia migration. I spent quite a bit of time auditing the Endeca implementation, from the XML files to the Windows desktop application. It was pretty good for its era.


While the acquisition was good for me financially, I agree that Oracle didn't really invest in sustaining Endeca as a product. I am proud of what we achieved at Endeca, but search has come a long way since then.

Coincidentally, I now consult for Algolia. :-)


Cool, I thought I was the only one who has uttered Endeca and Algolia in the same sentence.

I wrote up some tips/impressions on configuring Algolia for ecomm:

https://www.avikaminetzky.dev/posts/algolia-ecommerce-nextjs


I’d argue the beauty of faceted search is that it doesn’t require predefined categories, which (eg if using something like datasette) helps to explore the data even if it’s a new dataset


Last time I used solr the facets were defined in the schema so Lucene could create and index on that field.


The facets need to be defined but their possible values are computed from the data and don’t need to be predefined. These are also recomputed based on existing filters.


I've found that, for mobile screens, the most valuable refinement real estate is often the top row of the results. Since that space is very limited, there's only room for the most useful facets or filters. And you have to decide whether to show keys or values, e.g., "Brand" as a key or "Nike", "Adidas", etc. as values. Showing keys takes up less space and allows you to cover more ground, but showing values may be more useful -- and certainly more discoverable -- to the user, since there's one less step. As with all things, it's a tradeoff, and I don't think there's been that much research on optimizing it.


How do you calculate the count for each filter without running a separate query for each filter?


I worked in a search engine :D

In short, we needed to approximate the count.

We had some kind of hard limit for matching documents during a regular keyword search (80k maybe? It's been years since I worked there). The ranking was done after this, as well as the aggregation of data for the facets. So if your query of "cute bunnies" (faceted on file type) filled up all 80k results by the time the query processor made it through 25% of the data, and those 80k results contained 5k gifs, then we'd display '20k' next to the 'gifs' check box.


In general, the two ways to compute counts are top-down, by making a separate query for each filter, or bottom-up, by scanning the results and aggregating the counts, like a group-by. Top-down is good for a small universe of values, but bottom-up tends to be the scalable approach. And, as has been pointed out, you can produce approximations by aggregating a sample of the results -- as long as it is a representative random sample. Just be mindful of statistics, particularly confidence intervals.

A related issue is that counts tend to treat all results as equal. If you retrieva a lot of results but most of them are not relevant -- as can happen with full-text search -- then the counts can be misleading. You may have the converse problem if your retrieval excludes a lot of relevant results. So, if you are implementing a faceted search application where you use and show counts, you should keep in mind that it will only work if your retrieval does a reasonable job of balancing precision and recall.

Finally, remember that supply != demand. The distribution of a facet in your index may be different from the distribution of that facet in searcher intent. A bit more on that here: https://dtunkelang.medium.com/search-intent-not-inventory-28...


Indexing technologies tend to have support for facets https://en.wikipedia.org/wiki/Faceted_search

generally what you have is lightweight searches that you can query and get just the count of documents you would receive, in the case of facets you can generally get a list of facets or even a list of facets relating to a particular search and a count for each of these facets.


Search indexes like Solr and Elasticsearch have the ability to calculate multiple facet counts efficiently in a single operation.

My application Datasette runs a separate SQL query for each one, which works fine if you are using SQLite and only have a few hundred thousand rows of data.


Counts per filter can be stored pre-computed, using the same inverted index structure the search uses.

Keyword -> (num_docs, doc1,doc3,doc4,…)

So you can quite quickly look up the number of matches.


That only works for the simplest case though - showing a count where the starting point was the entire corpus.

With faceted search you usually need to do things like "the user searched for 'x' and filtered for 'price less than $Y', now show counts for each of the different categories" - where pre-calculated counts won't help you.

Search indexes still help here, because they are really good at fast set intersections - so you take the set of document IDs matching your filters so far, then intersect them against the set of IDs that are listed for each of those categories and count the size of that intersection.


In elasticsearch, you can do this with filter aggregations.


Yep, I have a whole thing on how to design filters for small screens. But in short, partially overlay the screen. There’s nothing else you can really do without having downsides. But like you say discoverability goes down but it basically has to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: