An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Referenz

A. Robertson, P. Willet: An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm. Journal of Documentation, 52 (4) (1996), pp. 405–420

DOI

http://dx.doi.org/10.1108/eb026973

Abstract

This paper describes the development of a genetic algorithm (GA) for the assignment of weights to query terms in a ranked-output document retrieval system. The GA involves a fitness function that is based on full relevance information, and the rankings resulting from the use of these weights are compared with the Robertson-Sparck Jones F4 retrospective relevance weight. Extended experiments with seven document test collections show that the GA can often find weights that are slightly superior to those produced by the deterministic weighting scheme. That said, there are many cases where the two approaches give the same results, and a few cases where the F4 weights are superior to the GA weights. Since the GA has been designed to identify weights yielding the best possible level of retrospective performance, these results indicate that the F4 weights provide an excellent and practicable alternative. Evidence is presented to suggest that negative weights may play an important role in retrospective relevance weighting.

Extended Abstract

Bibtex

@article{
author = {A. Robertson, P. Willet},
title = {An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm},
journal = {Journal of Documentation},
volume = {52},
number = {4},
pages = {405–420},
year = {1996},
keywords={}
doi={},
url={http://dx.doi.org/10.1108/eb026973 http://de.evo-art.org/index.php?title=An_upperbound_to_the_performance_for_ranked-output_searching:_optimal_weighting_of_query_terms_using_a_genetic_algorithm},
}

Used References

1. SALTON, G., ed. The SMART retrieval system: experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall, 1971.

2. FRAKES, W.B. and BAEZA-YATES, R. Information retrieval: data structures and algorithms. Englewood Cliffs, NJ: Prentice Hall, 1992.

3. PRITCHARD-SCHOCH, T. Natural language comes of age. Online, 77(3), 1993, 33-43.

4. TENOPIR, C and CAHN, p. TARGET and Freestyle. DIALOG and Mead join the relevance ranks. Online, 18(3), 1994, 31-47.

5. SPARCK JONES, K. A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation, 28, 1972, 11-21.

6. SPARCK JONES, K. Index term weighting. Information Storage and Retrieval, 9, 1973, 619-633.

7. CROFT, W.B. and HARPER, D.J. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35, 1979, 285-295. http://www.sciencedirect.com/science/article/pii/0020027173900430

8. CROFT, W.B. Experiments with representation in a document retrieval system. Information Technology: Research and Development, 2, 1983, 1-21.

9. ROBERTSON, S.E. On relevance weight estimation and query expansion. Journal of Documentation, 42, 1986, 182-188.

10. SALTON, G. and BUCKLEY, C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 1988, 513-523. http://www.sciencedirect.com/science/article/pii/0306457388900210

11. ROBERTSON, S.E. and WALKER, S. Some simple approximations to the 2- Poisson model for probabilistic weighted retrieval. In: CROFT, W.B. and VAN RIJSBERGEN, C.J., eds. SIGIR '94: proceedings of the seventeenth international conference on research and development in information retrieval. London: Springer-Verlag, 1994, 232-241.

12. ROBERTSON, S.E. AND SPARCK JONES, K. Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 1976, 129-145. http://onlinelibrary.wiley.com/doi/10.1002/asi.4630270302/abstract

13. PORTER, M. AND GALPIN, v. Relevance feedback in a public access catalogue for a research library: Muscat at the Scott Polar Research Institute. Program, 22, 1988, 1-20.

14. SALTON, G. and BUCKLEY, C. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 1990, 288-297. http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-4571(199006)41:4%3C288::AID-ASI8%3E3.0.CO;2-H/abstract

15. KEEN, E.M. The use of term position devices in ranked output experiments. Journal of Documentation, 47, 1991, 1-22.

16. WILKINSON, R. Effective retrieval of structured documents. In: CROFT, W.B. and VAN RIJSBERGEN, C.J., eds. SIGIR '94: proceedings of the seventeenth international conference on research and development in information retrieval. London: Springer-Verlag, 1994, 311-317.

17. JACKSON, D.M. Classification, relevance and information retrieval. Advances in Computers, 11, 1971, 60-125. http://www.sciencedirect.com/science/article/pii/S0065245808606300

18. SPARCK JONES, K. A performance yardstick for test collections. Journal of Documentation, 31, 1975, 266-272.

19. STIRLING, K.H. The effect of document ranking on retrieval system performance: a search for an optimal ranking rule. PhD thesis, University of California, 1977.

20. SCHAUBLE, P. On the compatibility of retrieval functions, preference relations, and document descriptions. Zurich: Eidgenossische Technische Hochschule, Institut für Informationssysteme, 1989.

21. HEINE, M.H. AND TAGUE, J.M. An investigation of the optimization of search logic for the Medline database. Journal of the American Society for Information Science, 42, 1991, 267-278. http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-4571(199105)42:4%3C267::AID-ASI3%3E3.0.CO;2-Y/abstract

22. KEEN, E.M. Presenting results of experimental retrieval comparisons. Information Processing and Management, 28, 1992, 491-502. http://www.sciencedirect.com/science/article/pii/030645739290006L

23. LOSEE, R.M. Upper bounds for retrieval performance and their use for generating optimal Boolean queries: can it get any better than this? Information Processing and Management, 30, 1994, 193-203. http://www.sciencedirect.com/science/article/pii/0306457394900647

24. SHAW, W.M. Term-relevance computations and perfect retrieval performance. Information Processing and Management, 31, 1995, 491-498. http://www.sciencedirect.com/science/article/pii/0306457395000115

25. GOLDBERG, D.E. Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-Wesley, 1989.

26. DAVIS, L., ed. Handbook of genetic algorithms. New York: Van Nostrand Reinhold, 1991.

27. MICHALEWICZ, Z. Genetic algorithms + data structures = evolution programs. Berlin: Springer-Verlag, 1992.

28. MITCHELL, M. All introduction to genetic algorithms. Cambridge, MA: MIT Press, 1996.

29. RAGHAVAN, V.V. and AGARWAL, B. Optimal determination of user-orientated clusters: an application for the reproductive plan. In: GREFENSTETTE, J.J., ed. Genetic algorithms and their applications: proceedings of the second international conference on genetic algorithms and their applications. Hillsdale, NJ: Erlbaum, 1987, 241-246.

30. GORDON, M. Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31, 1988, 1,208-1,218. http://dl.acm.org/citation.cfm?doid=63039.63044

31. FRIEDER, O. and SIEGELMANN, H.T. On the allocation of documents in multiprocessor information retrieval systems. In: BOOKSTEIN, A., CHIARAMELLA, Y., SALTON, G. and RAGHAVAN, V.V., eds. SIGIR '91: proceedings of the fourteenth annual international ACM/SIGIR conference on research and development in information retrieval. New York: ACM Press, 1991, 230-239. http://dx.doi.org/10.1145/122860.122884

32. PETRY, F.E., BUCKLES, B.P., PRABBU, D. and KRAFT, D.H. Fuzzy information retrieval using genetic algorithms and relevance feedback. In: BONZI, S., ed. ASIS '93: proceedings of the 56th ASIS annual meeting. Medford, NJ: American Society for Information Science, 1993, 122-125.

33. YANG, J-J., KORFHAGE, R.R. and RASMUSSEN, E.M. Query improvement in information retrieval using genetic algorithms - a report on the experiments of the TREC project. In: HARMAN, D.K., ed. The first text retrieval conference (TREC-1). Washington: National Institute of Standards and Technology, 1993, 31-58. (NIST Special Publication 500-207)

34. ROBERTSON, A.M. and WILLETT, P. Generation of equifrequent groups of words using a genetic algorithm. Journal of Documentation, 50, 1994, 213-232.

35. SMITH, M., SMITH, M.P. and WADE, S.J. Applying genetic programming to the problem of term weight algorithms. New Review of Document and Text Management, 1, 1995, 101-110.

36. WADE, S.J., SMITH, M. and WOLSTENHOLME, M. Application of a genetic algorithm to the production of text signatures. New Review of Document and Text Management, 1, 1995, 147-166.

37. ROBERTSON, A.M. and WILLETT, P. The use of genetic algorithms in information retrieval. London: British Library Research and Development Department, 1995. (British Library R&D report 6201).

38. SPARCK JONES, K. Search term relevance weighting given little relevance information. Journal of Documentation, 35, 1979, 30-48.

Links

Full Text

internal file


Sonstige Links