Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BM25 – The Next Generation of Lucene Relevance (opensourceconnections.com)
95 points by based2 on March 13, 2016 | hide | past | favorite | 7 comments


Xapian has been using BM25 forever[1] - what made Lucene switch now?

[1] https://xapian.org/docs/bm25.html


BM25 has been available for Lucene since version 4 thanks to a refactoring that allowed anyone to plug in a custom relevance score if I remember correctly . The big news is that starting version 6, BM25 is the default.


Yes, so I am asking why is Lucene switching now?


Here's the issue tracking[1].

Basically it looks like the typical conservative approach most large scale open source projects are forced to take because of cross dependencies.

It's a (potentially) breaking change, so they waited for a major release.

[1] https://issues.apache.org/jira/browse/LUCENE-6789


BM25 has been available for awhile so is there a reason it is becoming default soon besides it is better in certain scenarios? Have there been improvements to the implementation recently?


I went to the elastic conf last month, there is a great talk about BM25. It mentions that BM25 is more robust against common words than plain tf*idf.

https://www.elastic.co/elasticon/conf/2016/sf/improved-text-...


Doesn't lucene let you choose your own relevance algorithms, smoothing, metrics, etc. anyway?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: