Evaluating Search Engine Relevance with Click-Based Metrics

Filip Radlinski, Madhu Kurup, Thorsten Joachims : Evaluating Search Engine Relevance with Click-Based Metrics. In: Fürnkranz, J. and Hüllermeier, E.: Preference Learning, 2011, 337-361.




Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. In this chapter, we expand upon, Radlinski et al. (How does clickthrough data reflect retrieval quality, In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 43–52, 2008), presenting a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (including the number of clicks observed, the frequency with which users reformulate their queries, and how often result sets are abandoned) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than the absolute usage metrics in our domain.

Extended Abstract


booktitle={Preference Learning},
editor={Fürnkranz, Johannes and Hüllermeier, Eyke},
title={Evaluating Search Engine Relevance with Click-Based Metrics},
publisher={Springer Berlin Heidelberg},
author={Radlinski, Filip and Kurup, Madhu and Joachims, Thorsten},

Used References

Full Text


