Genetic-based approaches in ranking function discovery and optimization in information retrieval – a framework
Inhaltsverzeichnis
Referenz
W. Fan, P. Pathak, M. Zhou: Genetic-based approaches in ranking function discovery and optimization in information retrieval – a framework. Decision Support Systems 47 (2009) 398–407.
DOI
http://dx.doi.org/10.1016/j.ipm.2003.08.001
Abstract
Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.
Extended Abstract
Bibtex
@article{Fan2004587, title = "A generic ranking function discovery framework by genetic programming for information retrieval ", journal = "Information Processing & Management ", volume = "40", number = "4", pages = "587 - 602", year = "2004", note = "", issn = "0306-4573", doi = "http://dx.doi.org/10.1016/j.ipm.2003.08.001", url = "http://www.sciencedirect.com/science/article/pii/S0306457303000700 http://de.evo-art.org/index.php?title=Genetic-based_approaches_in_ranking_function_discovery_and_optimization_in_information_retrieval_%E2%80%93_a_framework", author = "Weiguo Fan and Michael D Gordon and Praveen Pathak", keywords = "Information retrieval", keywords = "Ranking function", keywords = "Genetic algorithms", keywords = "Genetic programming", keywords = "Text mining " }
Used References
Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction––on the automatic evolution of computer programs and its applications. San Francisco, CA: Morgan Kaufmann Publishers.
Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automatic combination of multiple ranked retrieval systems. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 173–181). http://www.citeseer.nj.nec.com/bartell94automatic.html
Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, 49(7), 604–618. http://dx.doi.org/10.1002%2F(SICI)1097-4571(19980515)49%3A7%3C604%3A%3AAID-ASI3%3E3.0.CO%3B2-T
Fan, W., Gordon, M. D., & Pathak, P. (2000). Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of 2000 international conference on information systems (ICIS), Brisbane, Australia (pp. 20–34).
Fox, E. A. (1983). Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. thesis, Cornell University.
Fox, E. A., Koushik, M. P., Shaw, J., Modlin, R., & Rao, D. (1993). Combining evidence from multiple searches. In Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication 500-207 (pp. 319–328).
Fuhr, N., & Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223–248. http://www.citeseer.nj.nec.com/fuhr91probabilistic.html http://dx.doi.org/10.1145%2F174608.174612
Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as combination of abstraction, inductive learning and probabilistic assumptions. ACM Transactions on Information Systems, 12(1), 92–115. http://www.citeseer.nj.nec.com/fuhr94probabilistic.html http://dx.doi.org/10.1145%2F174608.174612
Gey, F. C. (1994). Inferring probability of relevance using the method of logistic regression. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 222–231).
Gordon, M. (1988). Probabilistic and genetic algorithms for document retrieval. Communications of ACM, 31(2), 152– 169.
Gordon, M. (1991). User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Information Science, 42(5), 311–322. http://dx.doi.org/10.1002%2F(SICI)1097-4571(199106)42%3A5%3C311%3A%3AAID-ASI1%3E3.0.CO%3B2-J
Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing and Management, 35(2), 141–180. http://www.sciencedirect.com/science/article/pii/S0306457398000417 http://www.sciencedirect.com/science/article/pii/S0306457398000417/pdfft?md5=eeabf079b811ae5d899f4243073930f2&pid=1-s2.0-S0306457398000417-main.pdf
Harman, D. K. (1993). Overview of the first text retrieval conference (TREC-1). In D. K. Harman (Ed.), Proceedings of the first text retrieval conference. NIST Special Publication 500-207 (pp. 1–20).
Harman, D. K. (1996). Overview of the fourth text retrieval conference (TREC-4). In D. K. Harman (Ed.), Proceedings of the fourth text retrieval conference. NIST Special Publication 500-236 (pp. 1–24).
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2), 207–227.
Jones, W. P., & Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6), 420–442. http://dx.doi.org/10.1002%2F(SICI)1097-4571(198711)38%3A6%3C420%3A%3AAID-ASI3%3E3.0.CO%3B2-S
Koza, J. R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press.
Langdon, W. B. (1998). Data structures and genetic programming: genetic programming + data structures¼automatic programming. Kluwer Publishing.
Lee, J. H. (1997). Analyses of multiple evidence combination. In The proceedings of twentieth annual international ACM SIGIR conference on research and development in information retrieval (pp. 267–276).
Martin-Bautista, M. J., Vila, M., & Larsen, H. L. (1999). A fuzzy genetic algorithm approach to an adaptive information retrieval agent. Journal of the American Society for Information Science, 50(9), 760–771. Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw Hill. http://dx.doi.org/10.1002%2F(SICI)1097-4571(1999)50%3A9%3C760%3A%3AAID-ASI4%3E3.0.CO%3B2-O
Pathak, P., Gordon, M., & Fan, W. (2000). Effective information retrieval using genetic algorithms based matching function adaptation. In Proceedings of the 33rd Hawaii international conference on system science (HICSS), Hawaii, USA.
Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., & Breuel, T. (2002). Personalized search. Communications of the ACM, 45(9), 50–55.
Raghavan, V. V., & Agarwal, B. (1987). Optimal determination of user-oriented clusters: an application for the reproductive plan. In Proceedings of the second international conference on genetic algorithms and their applications, Cambridge, MA (pp. 241–246).
Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. New Jersey: Prentice Hall.
Salton, G. (1989). Automatic text processing. Reading, MA: Addison-Wesley Publishing Co.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523. http://www.sciencedirect.com/science/article/pii/0306457388900210 http://www.sciencedirect.com/science/article/pii/0306457388900210/pdf?md5=728e77cdd390e06b1c32ad998f9d2bd8&pid=1-s2.0-0306457388900210-main.pdf
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.
Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document length normalization. Information Processing and Management, 32(5), 619–633. http://www.sciencedirect.com/science/article/pii/0306457396000088 http://www.sciencedirect.com/science/article/pii/0306457396000088/pdf?md5=b18f9a5640efeda48a66a4159a7e5f14&pid=1-s2.0-0306457396000088-main.pdf
Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173. http://dx.doi.org/10.1023%2FA%3A1009980820262
Voorhees, E. M., & Harman, D. K. (1998). Overview of the seventh text retrieval conference (TREC-7). In E. M. Voorhees & D. K. Harman (Eds.), Proceedings of the seventh text retrieval conference. NIST Special Publication 500-242 (pp. 1–24).
Zobel, J., & Moffat, A. (1998). Exploring the similarity space. SIGIR Forum, 32(1), 18–34 http://dx.doi.org/10.1145%2F281250.281256
Links
Full Text
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.4456&rep=rep1&type=pdf