I like the term "faceted search", which I first experienced working with Solr.
Faceted search is effectively what this person is calling "filters", but often comes with an amazing bonus feature: each filter shows a count of how many results will be returned if you click it.
This turns them into a powerful way to summarize your data.
One challenge I've found with both filters and filters-with-counts is the best way to design them for mobile screens, since they can take up a lot of valuable real estate.
It can be done though: e-commerce sites in particular often come up with neat UIs for tucking the filters away in an easily accessible tray - though that can hurt discoverability of the feature in the same way this author complains about the advanced search pattern.
Afaik faceted search just means the system provides filtering through predefined categories (taxonomies). In TFA both “advanced search” and “filters” are faceted.
This is the single biggest problem I've been having with the term "faceted search": I can't find a single, universally agreed upon definition of exactly what it means!
But I really need it to have one, because it's a key feature of the software I am building.
If there's a more widely accepted term for my version of it - filters with displayed counts - I'd love to know about it.
As it is, most people have never heard the term anyway.
Thanks for the kind words! I'm not usually on HN, but I just discovered this thread and am happy to contribute if I have anything useful to add. The book is a bit dated, but I have continued to post about search more broadly on Medium, particularly on the topic of query understanding.
I also think of faceted search of having the counts and narrowing the options as you drill down.
Ironically enough that was through Endeca which was introduced to my company (at great expense) as I was pushing for solr. Eventually solr became the tool of choice because it was more flexible and less cumbersome.
I recently completed an Endeca to Algolia migration. I spent quite a bit of time auditing the Endeca implementation, from the XML files to the Windows desktop application. It was pretty good for its era.
While the acquisition was good for me financially, I agree that Oracle didn't really invest in sustaining Endeca as a product. I am proud of what we achieved at Endeca, but search has come a long way since then.
I’d argue the beauty of faceted search is that it doesn’t require predefined categories, which (eg if using something like datasette) helps to explore the data even if it’s a new dataset
The facets need to be defined but their possible values are computed from the data and don’t need to be predefined. These are also recomputed based on existing filters.
I've found that, for mobile screens, the most valuable refinement real estate is often the top row of the results. Since that space is very limited, there's only room for the most useful facets or filters. And you have to decide whether to show keys or values, e.g., "Brand" as a key or "Nike", "Adidas", etc. as values. Showing keys takes up less space and allows you to cover more ground, but showing values may be more useful -- and certainly more discoverable -- to the user, since there's one less step. As with all things, it's a tradeoff, and I don't think there's been that much research on optimizing it.
We had some kind of hard limit for matching documents during a regular keyword search (80k maybe? It's been years since I worked there). The ranking was done after this, as well as the aggregation of data for the facets. So if your query of "cute bunnies" (faceted on file type) filled up all 80k results by the time the query processor made it through 25% of the data, and those 80k results contained 5k gifs, then we'd display '20k' next to the 'gifs' check box.
In general, the two ways to compute counts are top-down, by making a separate query for each filter, or bottom-up, by scanning the results and aggregating the counts, like a group-by. Top-down is good for a small universe of values, but bottom-up tends to be the scalable approach. And, as has been pointed out, you can produce approximations by aggregating a sample of the results -- as long as it is a representative random sample. Just be mindful of statistics, particularly confidence intervals.
A related issue is that counts tend to treat all results as equal. If you retrieva a lot of results but most of them are not relevant -- as can happen with full-text search -- then the counts can be misleading. You may have the converse problem if your retrieval excludes a lot of relevant results. So, if you are implementing a faceted search application where you use and show counts, you should keep in mind that it will only work if your retrieval does a reasonable job of balancing precision and recall.
generally what you have is lightweight searches that you can query and get just the count of documents you would receive, in the case of facets you can generally get a list of facets or even a list of facets relating to a particular search and a count for each of these facets.
Search indexes like Solr and Elasticsearch have the ability to calculate multiple facet counts efficiently in a single operation.
My application Datasette runs a separate SQL query for each one, which works fine if you are using SQLite and only have a few hundred thousand rows of data.
That only works for the simplest case though - showing a count where the starting point was the entire corpus.
With faceted search you usually need to do things like "the user searched for 'x' and filtered for 'price less than $Y', now show counts for each of the different categories" - where pre-calculated counts won't help you.
Search indexes still help here, because they are really good at fast set intersections - so you take the set of document IDs matching your filters so far, then intersect them against the set of IDs that are listed for each of those categories and count the size of that intersection.
Yep, I have a whole thing on how to design filters for small screens. But in short, partially overlay the screen. There’s nothing else you can really do without having downsides. But like you say discoverability goes down but it basically has to.
I recently spent a few months buying a new sectional sofa. Boy is the interface for that process terrible. The experience went something like this:
1. Pick the overall style
2. Find vendors that make that style
3. Physically drive to far-away stores to get a sense for what our “comfort” KPIs are in terms of seat depth/height, back height, cushion fill, etc.
4. Go back online, manually compile a table of the above parameters across vendors, mainly pulling from PDF brochures with incomplete or incompatible information. Down filter to a few main candidates that satisfy major comfort and aesthetic objectives.
5. Take room measurements to understand constraints on configuration
6. With narrowed candidates, again manually compile available sectional piece options to determine the configuration that best fits our room.
7. Down to 2-3 options, go back to physical stores to test comfort (if they weren’t part of first round) and evaluate fabrics and finishes.
8. Final discussions, including considerations about price comp and lead times (varied from 2-6 months!). Finally make purchase.
Maybe this is a little off-topic. But the point is this process was ultimately a couple of nested Sort/Filter operations, once the data was structured in a format relevant to our decision. That’s the biggest problem with online shopping - most of the data relevant to decisions is unstructured or unmeasurable. Once you get through that, then yes, filters are almost always the goal. I don’t really care about your fancy UI - I just want the information relevant to me in a spreadsheet so I can do a nested Sort.
I recently brought a computer chair. I guess I did everything on your list, for months, and couldn't find anything actually good anywhere, mostly because the dimensions that I cared for were never listed as filters.
Then I came to a nearby manufacturer's store, just tried all the models and found an ok one.
I also had to change a light-switch. No store would both let me filter the assembly, switches and finishing by model line and the complete trio. So I came into a physical store and got a preassembled set from the display.
Webstores have became completely incompetent pieces of garbage since they all consolidated into an oligopoly.
I recently bought some new bike hubs. There are many ways hubs can vary, many of which are critical for compatibility, so you have to get them right. Good faceted search makes this very easy, and its absence makes it agony. This is good:
Ha, realized this when writing it out. We actually did start with measuring the room, but that only narrows it to a pretty wide range of configurations. This step was more about getting precise measurements to think about how different layouts would look and feel. We also just didn’t know what options there were until exploring some of the top candidates in detail. Definitely not as linear as I described it. The config is more like a layer of nuance on top of the hard constraints of style and comfort.. although I’m sure others would consider those soft constraints.
LLMs with access to real-time data and reviews should be able to simplify this process a lot. “Show me a white and gold sectional in the art deco style for under $2500.” “Actually none of these look comfortable but I like the vertical lines on this one. Also make sure that any option you present does not contain toxic fire retardants or otherwise contains materials that result in the California prop whatever warning” “this looks great but can you find one with power reclining” Etc…buying something linked from the LLM will count as a referral and generate revenue for its developer.
Lol. I should have said AI rather than LLM because it would look at product images too - GPT-4 is already multi-modal. GPT-5 (and equivalents) is probably where services like this will start trickling out, and by then the model should have a lower error rate. The jump from 3.5 to 4 was huge.
The AI doesn’t need to be perfect - even if it makes an occasional error such as showing you a couch with flame retardants, you can spot that in the listing yourself. Most other qualities of a couch are very visible and you’ll be able to tell visually how good of a job the AI is doing.
Can you please give us "grep -v" or inverse search: allow us to enable filters which find everything that doesn't contain such a property. This is a horrible omission in so many search/filter interfaces.
Also, give me the ability to find nulls. Call it "other" or "unknown" if that's more accessible to normal people. In an ideal world, every product listing would have all relevant fields populated (and with correct values). In the real world, practically all databases have errors.
There are 419 results. On the left, the Package filter has Bottle (168 results) and Can (139 results), which totals 307. All 419 products actually are bottles or cans, so 112 products just don't have the data. If you filter, they will be missing.
Yes, yes and yes. When the filter for a given property is set to a value, but an specific item has a null for that property, the correct thing is to match the item (since it is unknown, it is wrong to assume the value doesn't match!).
I would even prefer two result groups: matching filters and "maybe matching filters". But most faceted search engines just filter out items when properties may or may not match a property.
Access Denied
Error 16
www.heb.com
2023-08-27 20:42:09 UTC
What happened?
This request was blocked by our security service
Incident ID: 1309001730138387928-244354802817772362
Getting the UI for that right seems like the real trick. We have checkboxes and radio buttons that are well-understood to give us the thing(s) that are selected or checked, but we don’t have anti-checkboxes.
We actually do. Thanks to the "indeterminate" state on HTML checkboxes (that one where it's sort of filled in instead of checked), you can use unchecked for "exclude" or "must not have", indeterminate for "don't care" or "both" and checked for "include" or "must have".
Unfortunately, in typical browsers there is no way for the user to set them back to indeterminate unless the developer implements that using javascript.
An interesting suggestion, but ‘indeterminate’ maps to nullish values, not to negative values.
If you have options such as “oak”, “teak” and “walnut”, and there are some options outside of those, how do you allow users to select neither oak, teak or walnut? It’s not through selecting one or all as indeterminate — it’s gotta be through some other mechanism.
In truth, it's something that the majority of people will never feel the need to use. So it could realistically be hidden/unlocked upon request. In fact, it would probably be better UX/UI to have it as a directive in an Advanced Search interface.
Just have boolean operators and a little tooltip how to write them. Abstraction here isn’t needed. You have to learn the ux anyhow, may as well have the users learn boolean operators so you don’t have to bother with a ux that might end up not being that optimal in the end.
Most filters don't work like that. If you check a box, it excludes everything without that facet. If you uncheck a box, that facet is ignored (it is neither included nor excluded.)
For example, if I want to search for size-9 blue Adidas shoes, I find the three checkboxes for "9", "blue" and "Adidas" in the shoe department. If I uncheck "blue" then the search will show me every size-9 Adidas shoe, whatever its color.
Since unchecked boxes are simply ignored, there is no signal I can use in such a filtered search to exclude facets which I specify. I have never seen a filtered-search interface which permits this in the sectors where I most need it.
For example, search for a therapist who practices CBT. Easy. Now try listing all therapists who do not practice CBT. Impossible!
For those like me who will be initially perplexed:
Context filters take the results of a search and allow you to filter them by things like "five stars" or labels. There is much discussion about why advanced search is bad, but the case for filters is a one liner - filters solve all the problems. And the "how to use filters" part is the last drawing in the article.
Once I got that it is an interesting thought, but I'm left trying to understand if there are downsides to filters?
B) UX - if you some filters but not the one(s) I want or need, then I'll get frustrated. No filters at all won't set me up for that. For example, eBay's filters are good for some types of products but not others.
C) A and B combined. I've noticed that some filtering UX - Home Depot, I believe is an example - will update the count of the # of products matching your filters. Check a box...wait. Check another box...wait. Fat finger the wrong box...wait...uncheck...wait.
I'm not saying these are reasons to not use filters, only pointing out some friction.
Filters are also slow as fuck because they almost always need to apply before you can go on. So you are on a results page, you want 4-5 starts with attributes x and y, that’s not just 4 clicks that’s 4 clicks and waiting for the entire thing to reload every time before you can continue.
An other issue more intrinsic to filters is their inconsistency: because only applicable filters are shown entire categories can be missing and you need to hunt through every time.
If you make users click a Go button, they won't notice it, and get frustrated.
It's one of very few times I'll allow an animation. If the go button has been available but not clicked for some seconds, it will call attention to itself. I hate that but the only users who see it are the confused ones who need it. (Hopefully.)
Yes and no. If you wait to indicate # of results from filter choices you run the risk of the user "drilling down" to zero.
That is...pick...pick...pick...Go.."No results for your filters". That's not fun either.
I find I use filters - again Home Depot is a good example of bad - because the search is too loose. I'll enter a brand in the search term and have to use filters to pick only the brand I want to see.
The search should first try to answer the users search, not dump more possible products in the customers' face hoping the buy sonething.
Our solution for this was to put something like a 5-second debounce on automatically updating, but skip the delay and update immediately if the mouse moves outside of the filter area.
I'll add that eBay's search is really good if you know how to use it. By use it I mean encapsulating every important phrase that you want to include or exclude within parenthesis and using commas as "or" statements within the parenthesis, such that "(phrase 1) -(phrase 2, phrase 3)" will return results for phrase 1 and exclude any listings that contain phrase 2 or phrase 3. It also pays attention to spacing so that only exact matches are returned.
Obvious one to me is arbitrary complexity: You can't do "A or (B and C)"
But even without such a complex use case, you have to consider multi-valued fields where a single search result can have two or more values. For example "charging method" for phones - if one supports both "usb-c" and "wireless", does selecting both of those filters mean AND or OR? (You could add a toggle for it, sure, but this is just a simple example that's easily missed, for example on Amazon)
It relies on things being correctly tagged. Over or under tagging will reduce discoverability. If you have a product (or tag) that is loosely defined or changes over time, this can cause issues as well. It's easy to do this for something rigid like technical specs. but if you have ever looked for something hyperspecific (e.g. multiple tags) on a site such as Newegg you've no doubt had some run-ins here and there with incorrect tagging. This is further exacerbated if you allow individual vendors to do the tagging of their own products, who will have a perverse incentive to do "SEO", or may not spend enough time/effort resulting in undertagging.
Often times what you are left with are only a handful of "high-level" tags/filters that don't actually filter much, forcing you to spend more time looking at and evaluating multiple products (which marketing/sales teams of companies probably view as a net positive).
TL;DR: Filtering is an absolute boon when there is proper tag curation.
Edit: I should clarify that I think Newegg is really solid as far as tagging is concerned, but the fact that this occasionally happens even in that environment just goes to show that it requires effort to do right and you may not get 100% discoverability.
Doesn't Advanced Search have the same problems? Ultimately, every input on an advanced search page can be a filter. The tagging issue, while incredibly valid, doesn't seem to be an inherent part of either approach.
I'm working on how to implement search (and filters) for classified ads and this was helpful.
For ad entry, it seems you can either constrain user's input (essentially forcing them to choose from a list of options - ad type, the thing, category, etc. Or let them do a more free form entry (just title, description, price) and then add tags to help people find their ad.
Which raises the issue of how to curate tags. After reading your comment, I am thinking of building a list of what tags people search for "jobs, tractors, ..." make that into a suggestion list after people have entered their free form ad.
The world is messy. For sale, resumes, wanted, car parts, used cars, car repair, cars for parts. Oh my.
The space constraints can be solved. The filters can be neatly arranged under the search bar, in dropdowns, with some indication of activity state when not expanded. Any active filter can appear as a tag label in the search bar (with a click or to dismiss or backspace at the start of the search bar to pop).
An excellent search bar might offer filter-suggestions as you type into the search bar, e.g. typing “Richmond” might show you a neighborhood filter for “The Richmond”, a city filter for “Richmond” along with the general search term “Richmond”.
This way the filter only occupy some cheap vertical space which can be scrolled passed, with only active filters taking some space horizontal from the search bar (which can always just wrap anyway).
It seems more like the writer is advocating "search configuration included with the results" vs "search configuration as a separate page"; which I agree with. That being said, I don't refer to "search configuration as a separate page" as "advanced search". Rather, advanced search to me is just search with more options (to filter, generally, but also sort, etc). As a general rule, automatically entering into search with EVERY available option visible tends to be a bad idea; leaving to analysis paralysis. Rather, I like the idea of starting off with the most common filters; and allowing the user to ask for more (advanced search, I guess?). If possible, filters should be faceted (with counts), as noted by simonw in another thread.
One of the best implementations for an advanced search I know is the price comparison platform geizhals. They allow you to quickly drill down to a subset of relevant products, the filters give you a quick overview of the options you have.
Check out the desktop version, the mobile version doesn’t give the same feeling.
I agree with the author’s notion. In ecomm your goal is zero clicks to purchase. Search is used differently by your site’s differing cohorts let alone wildly different search demands depending on your product. Anyone doing advanced search is not at the bottom of your funnel yet; they’re still deciding between products let alone ready to click buy on one.
Search bar tech is really good these days and solves a lot as it tries to drag out inferencing with suggestions and autocomplete.
(this comment probably isn't that interesting to anyone here, but it's an expression of how this article was useful for me, take it as a thank you :)
That's kinda what the sidebar is in my CMS is, which I settled on 11 years ago. I just skip the search part, or you could say I treat everything as a search (of either all nodes or the subnodes of the node currently viewed). Think wordpress tag clouds with taxonomies, and authors; If you click a tag, you'll all nodes with that tag, and the sidebar now shows only the tags and authors of nodes that have the currently tag. "Drilling down" would be too big a word here, because you can only have one tag and one author selected, but that's the basic idea. All author/tag is put into the URL parameters of relevant links, so you can e.g. use pagination or click an author and not lose the tag you initially selected.
I was kinda proud of it, but the code is horrible so I'm working on the successor, and this article got me thinking: I already plan on having more complex filters with AND, OR and nesting ("show all nodes that have author A and either tag X or tag Y, but not both tags, with tag Z" etc), but I probably totally generalize this, instead of implementing it for the stuff where it "makes sense" (tags, authors) to me now.
Because in some spots it might make sense to allow grouping nodes by title, or if they're a link, by the domain they point to, etc. images could be grouped by width, and it would be great to not have to care about that now, if I can just make it work with any metadata a node can have, be it "on" the node or in other tables that need to be joined, and make it configurable which properties are displayed and can be filtered for by default, plus a way for users to look at and filter by all the properties any of the nodes currently displayed have. Keep it simple, right?
They don’t really explain how filters differ from advanced search in their mind, other than apparently assuming that advanced search must be on a separate page. IMO it’s a continuum.
The approach they are arguing against has a “single box” keyword search that returns results in a clean set of results. The only way to refine results is to change the text in the keyword box. And then there is a little link to “advanced search” which is a separate page that offers filters in addition to the keyword box, before returning results. In both cases you have to re-run the search after making any changes to keywords or filters.
Basically: the site has two separate search experiences: simple keyword, and keyword + filters, connected by a little dinky text link.
What they are arguing for is a single search experience that starts with a single text box, but then exposes filter options as part of the results UI. You don’t have to hit the “search” button again, instead you “apply filters.”
This is an example of “progressive disclosure,” where more powerful features of the software are revealed to the user as their try to do more complex tasks.
Honestly, I think this article is at least 5 years too late, though. The value of filters in the results page is so obvious that even Google has adopted it. While I’m sure there are still some “advanced search” pages hanging around out there on the web, no one builds site search that way anymore. They all provide filters now.
I would suspect advanced search boils down to advancifying your search term with specific properties about that term, or even the term itself with specifying logic operators.
A filter would be not targeting a specific term, and would take no search terms, but would allow them to find the specific item they're wanting.
Intuitively, filters seem far superior to advanced search, as long as your search is good for one off items. I haven't ever really thought about this explicitly but now that I do, I always reach for:
1. very solid search for single items
2. very good filters
My experience is more on the content side than the product side, but we have mostly just abandoned search as a feature. 99.9% of searchers just to go to google. Our on-site search gets nearly zero traction. That's partly self-fulfilling because we've deemphasized it as it dropped in use and is not buried behind nav. But still, I'd generally recommend you invest in SEO and just ignore search.
Yeah, I mean ideally you’re nav is so simple that you don’t need search beyond Google. But it’s not always possible. Take an ecommerce store where the user wants to drill down on particular facets. That needs search (and filter).
No one uses our nav either. Like I said, we produce content so we have >100,000 articles. Our nav lets your drill by broad topics, but really it's just window dressing. It's almost all google, social and direct traffic just looking for whatever is latest. You have to think about what use case search is solving. Usually, it's discovery. People discover through google first and foremost, social and peers secondary. Unless you're Amazon, nobody is using you as a primary source to discover facets of anything.
One thing about facets is it probably depends a lot on what kind of data you have - you have retail data you know very well what your facets are - shoes, men's wear, women's wear, accessories etc. etc.
If you have a large number of historical documents that have been scanned in you probably think you know what your facets are but there are also likely to be one's that you miss, thus you want to do some sort of facet extraction to determine what you have and combine that with the facets you have defined.
Obviously not every facet that is found will be ones you want, for example when I was at Thompson Reuters we did a facet extraction on all Danish laws and cases and one of the facets was Knife Killing, which was not useful to know.
None of these are good since they involve way too many clicks on tiny square/round buttons with a lot of scrolling and expanding/collapsing various filter categories, so users can't "easily find" what they need
BTW, I highly recommend the Relevance Slack as a good space to discuss this and search-related topics generally. It's run by OpenSource Connections, but it's free and anyone can join, even folks who compete with them. It's the best community I have found for search developers.
It depends on the situation. I absolutely believe the article's author would choose advanced search over filters when the list is long and the load is slow. In that case is best to give the user an advanced search possibility first, and maybe filter after that. Think Amazon's style, do you really want to wait 2 minutes to get all video cards when entering that category and only after that to filter after 4090 ones?
I wouldn’t no. I’d look to sort out the root cause (slow server perf) in this particular case. But obviously we’re talking in the abstract. Advanced search is bascially the same as in context filters without the downsides.
I don't think it's either or. There are people who type in specific things in the search bar and then there are those who prefer to use toggles to narrow down to the results and both the audiences need to be catered to.
Faceted search is effectively what this person is calling "filters", but often comes with an amazing bonus feature: each filter shows a count of how many results will be returned if you click it.
This turns them into a powerful way to summarize your data.
My https://datasette.io has faceted search as a core feature - demo here: https://global-power-plants.datasettes.com/global-power-plan...
One challenge I've found with both filters and filters-with-counts is the best way to design them for mobile screens, since they can take up a lot of valuable real estate.
It can be done though: e-commerce sites in particular often come up with neat UIs for tucking the filters away in an easily accessible tray - though that can hurt discoverability of the feature in the same way this author complains about the advanced search pattern.