Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Robert Arens: Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain. In: Fürnkranz, J. and Hüllermeier, E.: Preference Learning, 2011, 363-383.




Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to search MEDLINE. We propose to address these problems through learning ranking functions from user relevance feedback. Simple active learning techniques can be used to learn ranking functions using a fraction of the available data, with performance approaching that of functions learned using all available data. Furthermore, ranking functions learned using metadata features from the Medical Subject Heading (MeSH) terms associated with MEDLINE citations greatly outperform functions learned using textual features. An in-depth investigation is made into the effect of a number of variables in the ranking round, while further investigation is made into peripheral issues such as users providing inconsistent data.

Extended Abstract


booktitle={Preference Learning},
editor={Fürnkranz, Johannes and Hüllermeier, Eyke},
title={Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain},
url={http://dx.doi.org/10.1007/978-3-642-14125-6_17, http://de.evo-art.org/index.php?title=Learning_SVM_Ranking_Functions_from_User_Feedback_Using_Document_Metadata_and_Active_Learning_in_the_Biomedical_Domain },
publisher={Springer Berlin Heidelberg},
author={Arens, Robert},

Used References

1. R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval (Addison-Wesley, 1999)

2. S. Blott, F. Camous, C. Gurrin, G.J.F. Jones, A.F. Smeaton, On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search, in Proceedings Conference on Information Research and Applications (CORIA ’05) (2005)

3. K. Brinker, Active learning of label ranking functions, in Proceedings of the International Conference on Machine Learning (ICML 2004) (2004)

4. K.A. Bronander, P.H. Goodman, T.F. Inman, T.L. Veach, Boolean search experience and abilities of medical students and practicing physicians. Teach. Learn. Med. 16(3), 284–289 (2004) http://dx.doi.org/10.1207/s15328015tlm1603_12

5. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning (2005)

6. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, H.-W. Hon, Adapting ranking SVM to document retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’06) (2006)

7. N. Christianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, 2000)

8. G.A. Churchill, J. Peter, Research design effects on the reliability of rating scales: A meta-analysis. J. Mark. Res. 21(4), 360–375 (1984) http://dx.doi.org/10.2307/3151463

9. W.W. Cohen, R.E. Schapire, Y. Singer, Learning to order things. J. Artif. Intell. Res. 10, 243–270 (1999)

10. D. Cohn, L. Atlas, R. Ladner, Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)

11. N. Craswell, D. Hawking, Overview of the TREC 2004 web track, in Proceedings of the Text Retrieval Conference (TREC ’04) (2004)

12. H. Drucker, B. Shahrary, D.C. Gibbon, Support vector machines: Relevance feedback and information retrieval. Inf. Process. Manag. 38, 305–323 (2002) http://dx.doi.org/10.1016/S0306-4573(01)00037-1

13. S. Ertekin, J. Huang, L. Bottou, C.L. Giles, Learning on the border: Active learning in imbalanced data classification, in Proceedings of the ACM Conference on Information and Knowledge Management (CIKM ’07) (2007)

14. Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (1997)

15. J. Fürnkranz, E. Hüllermeier, Pairwise preference learning and ranking, in Proceedings of the 14th European Conference on Machine Learning (ECML-03) (Springer, 2003), pp. 145–156

16. T. Goetz, C.-W. von der Lieth, PubFinder: A tool for improving retrieval rate of relevant pubmed abstracts. Nucleic Acids Res. 33, W774–W778 (2005). Web Server issue

17. S. Har-Peled, D. Roth, D. Zimak, Constraint classification: A new apporach to multiclass classification, in Algorithmic Learning Theory (Springer Berlin/Heidelberg, 2002)

18. R.B. Haynes, K.A. McKibbon, N.L. Wilczynski, S.D. Walter, S.R. Werre, Optimal search strategies for retrieving scientifically strong studies of treatment from MEDLINE: analytical survey. Br. Med. J. 330(7501), 1179 (2005)

19. W. Hersh, C. Buckley, T. Leone, D. Hickam, OHSUMED: An interactive retrieval evaluation and new large test collection for research, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’94) (1994)

20. J.R. Herskovic, E.V. Bernstam, Using incomplete citation data for MEDLINE results ranking, in Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA ’05) (2005), pp. 316–320

21. K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002) http://dx.doi.org/10.1145/582415.582418

22. T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’02) (2002), pp. 133–142

23. J. Lewis, S. Ossowski, J. Hicks, M. Errami, H.R. Garner, Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18), 2298–2304 (2006) http://dx.doi.org/10.1093/bioinformatics/btl388

24. Y. Lin, W. Li, K. Chen, Y. Liu, A document clustering and ranking system for exploring MEDLINE citations. J. Am. Med. Inf. Assn. 14(5), 651–661 (2007) http://dx.doi.org/10.1197/jamia.M2215

25. T.-Y. Liu, J. Xu, T. Qin, W. Xiong, H. Li, Letor: Benchmark dataset for research on learning to rank for information retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’07) (2007)

26. M. Muin, P. Fontelo, Technical development of PubMed interact: an improved interface for MEDLINE/PubMed searches. BMC Bioinformatics 6(36), (2006)

27. National Library of Medicine. Introduction to MeSH. http://www.nlm.nih.gov/​mesh/​introduction.​html (2009)

28. National Library of Medicine. MEDLINE fact sheet. http://www.nlm.nih.gov/​pubs/​factsheets/​medline.​html (2009)

29. M.V. Plikus, Z. Zhang, C.-M. Chuong, Pubfocus: Semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm. BMC Bioinformatics 7, 424 (2006) http://dx.doi.org/10.1186/1471-2105-7-424

30. F. Radlinski, T. Joachims, Query chains: Learning to rank from implicit feedback, in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)

31. S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford, Okapi at TREC-3, in Proceedings of the 3rd Text Retrieval Conference (TREC-3) (1995)

32. B.P. Suomela, M.A. Andrade, Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics 6, 75 (2005) http://dx.doi.org/10.1186/1471-2105-6-75

33. S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of the ACM International Conference on Multimedia (MM ’01) (2001)

34. G. You, S. Hwang, Personalized ranking: A contextual ranking approach, in Proceedings of the ACM Symposium on Applied Computing (SAC ’07) (2007)

35. H. Yu, SVM selective sampling for ranking with application to data retrieval, in Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)


Full Text


intern file

Sonstige Links
