I've worked on commercial systems where N<=10,000 in the evaluation set and the ...

jll29 · on Dec 1, 2022

NIST's TREC workshop series uses Cyril Cleverdon's methodology ("Cranfield paradigm") from the 1960s, and more could surely be done at the evaluation front:

- systematically addressing sampling error;

- more than 50 queries;

- more/all QRELs;

- full evaluation instead of system pooling;

- study IR not just of the English language (this has been picked up by CLEF and NTCIR in Europe and Japan, respectively)

- to devise metrics that take energy efficiency into account.

- ...

At the same time, we have to be very grateful to NIST/TREC for executing an international (open) benchmark annually, which has moved the field forward a lot in the last 25 years.