Genetic-based approaches in ranking function discovery and optimization in information retrieval – a framework

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche

Referenz

W. Fan, P. Pathak, M. Zhou: Genetic-based approaches in ranking function discovery and optimization in information retrieval – a framework. Decision Support Systems 47 (2009) 398–407.

DOI

http://dx.doi.org/10.1016/j.ipm.2003.08.001

Abstract

Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.

Extended Abstract

Bibtex

@article{Fan2004587,
title = "A generic ranking function discovery framework by genetic programming for information retrieval ",
journal = "Information Processing & Management ",
volume = "40",
number = "4",
pages = "587 - 602",
year = "2004",
note = "",
issn = "0306-4573",
doi = "http://dx.doi.org/10.1016/j.ipm.2003.08.001",
url = "http://www.sciencedirect.com/science/article/pii/S0306457303000700 http://de.evo-art.org/index.php?title=Genetic-based_approaches_in_ranking_function_discovery_and_optimization_in_information_retrieval_%E2%80%93_a_framework",
author = "Weiguo Fan and Michael D Gordon and Praveen Pathak",
keywords = "Information retrieval",
keywords = "Ranking function",
keywords = "Genetic algorithms",
keywords = "Genetic programming",
keywords = "Text mining "
}

Used References

Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction––on the automatic evolution of computer programs and its applications. San Francisco, CA: Morgan Kaufmann Publishers.

Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automatic combination of multiple ranked retrieval systems. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 173–181). http://www.citeseer.nj.nec.com/bartell94automatic.html

Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, 49(7), 604–618. http://dx.doi.org/10.1002%2F(SICI)1097-4571(19980515)49%3A7%3C604%3A%3AAID-ASI3%3E3.0.CO%3B2-T

Fan, W., Gordon, M. D., & Pathak, P. (2000). Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of 2000 international conference on information systems (ICIS), Brisbane, Australia (pp. 20–34).

Fox, E. A. (1983). Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. thesis, Cornell University.

Fox, E. A., Koushik, M. P., Shaw, J., Modlin, R., & Rao, D. (1993). Combining evidence from multiple searches. In Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication 500-207 (pp. 319–328).

Fuhr, N., & Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223–248. http://www.citeseer.nj.nec.com/fuhr91probabilistic.html http://dx.doi.org/10.1145%2F174608.174612

Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as combination of abstraction, inductive learning and probabilistic assumptions. ACM Transactions on Information Systems, 12(1), 92–115. http://www.citeseer.nj.nec.com/fuhr94probabilistic.html http://dx.doi.org/10.1145%2F174608.174612

Gey, F. C. (1994). Inferring probability of relevance using the method of logistic regression. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 222–231).

Gordon, M. (1988). Probabilistic and genetic algorithms for document retrieval. Communications of ACM, 31(2), 152– 169.

Gordon, M. (1991). User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Information Science, 42(5), 311–322. http://dx.doi.org/10.1002%2F(SICI)1097-4571(199106)42%3A5%3C311%3A%3AAID-ASI1%3E3.0.CO%3B2-J

Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing and Management, 35(2), 141–180. http://www.sciencedirect.com/science/article/pii/S0306457398000417 http://www.sciencedirect.com/science/article/pii/S0306457398000417/pdfft?md5=eeabf079b811ae5d899f4243073930f2&pid=1-s2.0-S0306457398000417-main.pdf

Harman, D. K. (1993). Overview of the first text retrieval conference (TREC-1). In D. K. Harman (Ed.), Proceedings of the first text retrieval conference. NIST Special Publication 500-207 (pp. 1–20).

Harman, D. K. (1996). Overview of the fourth text retrieval conference (TREC-4). In D. K. Harman (Ed.), Proceedings of the fourth text retrieval conference. NIST Special Publication 500-236 (pp. 1–24).

Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2), 207–227.

Jones, W. P., & Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6), 420–442. http://dx.doi.org/10.1002%2F(SICI)1097-4571(198711)38%3A6%3C420%3A%3AAID-ASI3%3E3.0.CO%3B2-S

Koza, J. R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press.

Langdon, W. B. (1998). Data structures and genetic programming: genetic programming + data structures¼automatic programming. Kluwer Publishing.

Lee, J. H. (1997). Analyses of multiple evidence combination. In The proceedings of twentieth annual international ACM SIGIR conference on research and development in information retrieval (pp. 267–276).

Martin-Bautista, M. J., Vila, M., & Larsen, H. L. (1999). A fuzzy genetic algorithm approach to an adaptive information retrieval agent. Journal of the American Society for Information Science, 50(9), 760–771. Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw Hill. http://dx.doi.org/10.1002%2F(SICI)1097-4571(1999)50%3A9%3C760%3A%3AAID-ASI4%3E3.0.CO%3B2-O

Pathak, P., Gordon, M., & Fan, W. (2000). Effective information retrieval using genetic algorithms based matching function adaptation. In Proceedings of the 33rd Hawaii international conference on system science (HICSS), Hawaii, USA.

Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., & Breuel, T. (2002). Personalized search. Communications of the ACM, 45(9), 50–55.

Raghavan, V. V., & Agarwal, B. (1987). Optimal determination of user-oriented clusters: an application for the reproductive plan. In Proceedings of the second international conference on genetic algorithms and their applications, Cambridge, MA (pp. 241–246).

Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. New Jersey: Prentice Hall.

Salton, G. (1989). Automatic text processing. Reading, MA: Addison-Wesley Publishing Co.

Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523. http://www.sciencedirect.com/science/article/pii/0306457388900210 http://www.sciencedirect.com/science/article/pii/0306457388900210/pdf?md5=728e77cdd390e06b1c32ad998f9d2bd8&pid=1-s2.0-0306457388900210-main.pdf

Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.

Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document length normalization. Information Processing and Management, 32(5), 619–633. http://www.sciencedirect.com/science/article/pii/0306457396000088 http://www.sciencedirect.com/science/article/pii/0306457396000088/pdf?md5=b18f9a5640efeda48a66a4159a7e5f14&pid=1-s2.0-0306457396000088-main.pdf

Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173. http://dx.doi.org/10.1023%2FA%3A1009980820262

Voorhees, E. M., & Harman, D. K. (1998). Overview of the seventh text retrieval conference (TREC-7). In E. M. Voorhees & D. K. Harman (Eds.), Proceedings of the seventh text retrieval conference. NIST Special Publication 500-242 (pp. 1–24).

Zobel, J., & Moffat, A. (1998). Exploring the similarity space. SIGIR Forum, 32(1), 18–34 http://dx.doi.org/10.1145%2F281250.281256

Links

Full Text

http://www.sciencedirect.com/science/article/pii/S0306457303000700/pdfft?md5=da311f2d2f22cd793e30c2ae54142379&pid=1-s2.0-S0306457303000700-main.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.4456&rep=rep1&type=pdf

internal file


Sonstige Links