Genetic algorithms in relevance feedback: a second test and new contributions
Inhaltsverzeichnis
Referenz
C. López-Pujalte, V. Guerrero, F. Moya: Genetic algorithms in relevance feedback: a second test and new contributions. Information Processing & Management, 39 (5) (2003), pp. 669–807
DOI
http://dx.doi.org/10.1016/S0306-4573(02)00044-4
Abstract
The present work is the continuation of an earlier study which reviewed the literature on relevance feedback genetic techniques that follow the vector space model (the model that is most commonly used in this type of application), and implemented them so that they could be compared with each other as well as with one of the best traditional methods of relevance feedback--the Ide dec-hi method. We here carry out the comparisons on more test collections (Cranfield, CISI, Medline, and NPL), using the residual collection method for their evaluation as is recommended in this type of technique. We also add some fitness functions of our own design.
Extended Abstract
Bibtex
@article{Lopez-Pujalte:2003:GAR:937531.937532, author = {L\'{o}pez-Pujalte, Cristina and Guerrero-Bote, Vicente P. and de Moya-Aneg\'{o}n, F{\'e}lix}, title = {Genetic Algorithms in Relevance Feedback: A Second Test and New Contributions}, journal = {Inf. Process. Manage.}, issue_date = {September 2003}, volume = {39}, number = {5}, month = sep, year = {2003}, issn = {0306-4573}, pages = {669--687}, numpages = {19}, url = {http://dx.doi.org/10.1016/S0306-4573(02)00044-4 http://de.evo-art.org/index.php?title=Genetic_algorithms_in_relevance_feedback:_a_second_test_and_new_contributions}, doi = {10.1016/S0306-4573(02)00044-4}, acmid = {937532}, publisher = {Pergamon Press, Inc.}, address = {Tarrytown, NY, USA}, keywords = {automatic indexing, genetic algorithms, information retrieval, relevance feedback, test collections}, }
Used References
1 Ricardo A. Baeza-Yates , Berthier Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999 http://dl.acm.org/citation.cfm?id=553876&CFID=558819604&CFTOKEN=68186175
2 R. K. Belew, Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents, Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, p.11-20, June 25-28, 1989, Cambridge, Massachusetts, USA http://doi.acm.org/10.1145/75334.75337
3 Chang, C.-H., & Hsu, C.-C. (1999). The design of an information system for hypertext retrieval and automatic discovery on WWW. Ph.D. Thesis, Department of CSIE, National Taiwan University.
4 Chang, Y. K., Cirillo, C., & Razon, J. (1971). Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton (Ed.), The smart retrieval system-experiments in automatic document processing (pp. 355-370). Englewood Cliffs, NJ: Prentice-Hall.
5 Hsinchun Chen, Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms, Journal of the American Society for Information Science, v.46 n.3, p.194-216, April 1995 http://dx.doi.org/10.1002/(SICI)1097-4571(199504)46:3%3C194::AID-ASI4%3E3.0.CO;2-S
6 Hsinchun Chen , Yi-Ming Chung , Marshall Ramsey , Christopher C. Yang, A smart itsy bitsy spider for the web, Journal of the American Society for Information Science, v.49 n.7, p.604-618, May 15, 1998 http://dx.doi.org/10.1002/(SICI)1097-4571(1998)49:7%3C604::AID-ASI3%3E3.3.CO;2-X
7 Hsinchun Chen , Ganesan Shankaranarayanan , Linlin She , Anand Iyer, A machine learning approach to inductive query by examples: an experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing, Journal of the American Society for Information Science, v.49 n.8, p.639-705, June 1998
8 Cordón, O., Moya, F., & Zarco, M. C. (1999). Breve estudio sobre la aplicacióón de los algoritmos genéticos a la recuperación de la información. In IV Congreso ISKO (pp. 179-186). London: Granada.
9 Cordón, O., Moya, F., & Zarco, M. C. (2000). A GA-P algorithm to automatically formulate extended boolean queries for a fuzzy information retrieval system by means of GA-P techniques. Mathware & Soft Computing, 7(2-3), 309-322.
10 Cordón, O., Moya, F., & Zarco, M. C. (2002). A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems. Soft Computing, in press.
11 Davis, L. (Ed.). (1991). Handbook of genetic algorithms. New York: Van Nostrand Reinhold.
12 Kenneth Alan De Jong, An analysis of the behavior of a class of genetic adaptive systems., University of Michigan, Ann Arbor, MI, 1975 http://dl.acm.org/citation.cfm?id=907087&CFID=558819604&CFTOKEN=68186175
13 Harris Drucker , Behzad Shahrary , David C. Gibbon, Support vector machines: relevance feedback and information retrieval, Information Processing and Management: an International Journal, v.38 n.3, p.305-323, May 2002 http://dx.doi.org/10.1016/S0306-4573(01)00037-1
14 William B. Frakes , Ricardo Baeza-Yates, Information retrieval: data structures and algorithms, Prentice-Hall, Inc., Upper Saddle River, NJ, 1992 http://dl.acm.org/citation.cfm?id=129687&CFID=558819604&CFTOKEN=68186175
15 Glover, D. E. (1987). Solving a complex keyboard configuration problem through a generalized adaptive search. In L. Davis (Ed.), Genetic algorithms and simulated annealing (pp. 12-31). San Mateo, CA: Morgan Kaufmann.
16 David E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1989 http://dl.acm.org/citation.cfm?id=534133&CFID=558819604&CFTOKEN=68186175
17 M. Gordon, Probabilistic and genetic algorithms in document retrieval, Communications of the ACM, v.31 n.10, p.1208-1218, Oct. 1988 http://doi.acm.org/10.1145/63039.63044
18 Michael D. Gordon, The necessity for adaptation in modified Boolean document retrieval systems, Information Processing and Management: an International Journal, v.24 n.3, p.339-347, May 1988 http://dx.doi.org/10.1016/0306-4573(88)90100-8
19 Gordon, M. D. (1991). User-based document clustering by redescribing subject descriptors with a genetic algorithms. Journal of the American Society for Information Science, 42(5), 311-322.
20 J Grefenstette, Optimization of control parameters for genetic algorithms, IEEE Transactions on Systems, Man and Cybernetics, v.16 n.1, p.122-128, Jan./Feb. 1986 http://dx.doi.org/10.1109/TSMC.1986.289288
21 Lawrence Davis, Genetic Algorithms and Simulated Annealing, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1987 http://dl.acm.org/citation.cfm?id=535531&CFID=558819604&CFTOKEN=68186175
22 Vicente P. Guerrero , Félix de Moya Anegón, Reduction of the dimension of a document space using the fuzzified output of a Kohonen network, Journal of the American Society for Information Science and Technology, v.52 n.14, p.1234-1241, December 2001 http://dx.doi.org/10.1002/asi.1189
23 Vicente P. Guerrero Bote , Félix de Moya Anegón , Victor Herrero Solana, Document organization using Kohonen's algorithm, Information Processing and Management: an International Journal, v.38 n.1, p.79-89, January 2002 http://dx.doi.org/10.1016/S0306-4573(00)00066-2
24 Donna Harman, Relevance feedback and other query modification techniques, Information retrieval: data structures and algorithms, Prentice-Hall, Inc., Upper Saddle River, NJ, 1992 http://dl.acm.org/citation.cfm?id=129698&CFID=558819604&CFTOKEN=68186175
25 M. R. Hilliard , G. E. Liepins , Mark Palmer , Michael Morrow , Jon Richardson, A classifier based system for discovering scheduling heuristics, Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, p.231-235, October 1987, Cambridge, Massachusetts, USA http://dl.acm.org/citation.cfm?id=42543&CFID=558819604&CFTOKEN=68186175
26 John H. Holland, Adaptation in natural and artificial systems, MIT Press, Cambridge, MA, 1992 http://dl.acm.org/citation.cfm?id=129194&CFID=558819604&CFTOKEN=68186175
27 Jorng-Tzong Horng , Ching-Chang Yeh, Applying genetic algorithms to query optimization in document retrieval, Information Processing and Management: an International Journal, v.36 n.5, p.737-759, Sept. 2000 http://dx.doi.org/10.1016/S0306-4573(00)00008-X
28 Ide, E. (1971). New experiments in relevance feedback. In G. Salton (Ed.), The SMART retrieval system (pp. 337-354). Englewood Cliffs, NJ: Prentice-Hall.
29 Kraft, D. H., Petry, F. E., Buckles, B. P., & Sadasivan, T. (1994). The use of genetic programming to build queries for information retrieval. In Proceedings of IEEE symposium on evolutionary computation. Orlando, FL.
30 Kraft, D. H., Petry, F. E., Buckles, B. P., & Sadasivan, T. (1995). Applying genetic algorithms to information retrieval systems via relevance feedback. In P. Bosc & J. Kacprzyk (Eds.), Fuzzy sets and possibility theory in database management systems, studies in fuzziness series (pp. 330-346).
31 Kraft, D. H., Petry, F. E., Buckles, B. P., & Sadasivan, T. (1997). Genetic algorithms for query optimization in information retrieval: relevance feedback. In E. Sanchez, T. Shibata, & L. A. Zadeh (Eds.), Genetic algorithms and fuzzy logic systems. Soft computing perspectives (pp. 155-173).
32 Kucera, H., & Francis, N. (1967). Computational analysis of present-day American English. Providence, RD: Brown University Press.
33 K. L. Kwok, Comparing representations in Chinese information retrieval, ACM SIGIR Forum, v.31 n.SI, p.34-41 http://doi.acm.org/10.1145/278459.258531
34 López-Pujalte, C. (2000). Algoritmos genéticos aplicados a la retroalimentación por relevancia. Ph.D. Thesis, Library and Information Science Faculty, University of Granada.
35 Cristina Lopez-Pujalte , Vicente P. Guerrero Bote , Félix de Moya Anegón, A test of genetic algorithms in relevance feedback, Information Processing and Management: an International Journal, v.38 n.6, p.793-805, November 2002 http://dx.doi.org/10.1016/S0306-4573(01)00061-9
36 Cristina López-Pujalte , Vicente P. Guerrero-Bote , Félix de Moya-Anegón, Order-based fitness functions for genetic algorithms applied to relevance feedback, Journal of the American Society for Information Science and Technology, v.54 n.2, p.152-160, January 2003 http://dx.doi.org/10.1002/asi.10179
37 Martín-Bautista, M. J. (2000). Modelos de computación flexible para la recuperación de informatión. Ph.D. Thesis, Computer Science Faculty, University of Granada.
38 Maria J. Martin-Bautista , Maria-Amparo Vila , Henrik Legind Larsen, A fuzzy genetic algorithm approach to an adaptive information retrieval agent, Journal of the American Society for Information Science, v.50 n.9, p.760-771, July 1999 http://dx.doi.org/10.1002/(SICI)1097-4571(1999)50:9%3C760::AID-ASI4%3E3.3.CO;2-F
39 Zbigniew Michalewicz, Genetic algorithms + data structures = evolution programs (3rd ed.), Springer-Verlag, London, UK, 1996 http://dl.acm.org/citation.cfm?id=229930&CFID=558819604&CFTOKEN=68186175
40 Terry Noreault , Michael McGill , Matthew B. Koll, A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment, Proceedings of the 3rd annual ACM conference on Research and development in information retrieval, p.57-76, June 23-27, 1980, Cambridge, England http://dl.acm.org/citation.cfm?id=636674&CFID=558819604&CFTOKEN=68186175
41 Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
42 Vijay Raghavan , Brijesh Agarwal, Optimal determination of user-oriented clusters: an application for the reproductive plan, Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, p.241-246, October 1987, Cambridge, Massachusetts, USA http://dl.acm.org/citation.cfm?id=42545&CFID=558819604&CFTOKEN=68186175
43 Vijay V. Raghavan , Kim Birchard, A clustering strategy based on a formalism of the reproductive process in natural systems, Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties, p.10-22, September 27-28, 1979, Dallas, Texas http://doi.acm.org/10.1145/511706.511709
44 Robertson, A. M., & Willett, P. (1994). Generation of equifrequent groups of words using a genetic algorithm. Journal of Documentation, 50(3), 213-232.
45 Robertson, A. M., Willett, P. (1995). Use of genetic algorithms in information retrieval. British Library, Research and Development Department, BLRD Report 6201.
46 Robertson, A. M., & Willett, P. (1996). An upperbound to the performance of ranked-output searching: optimal weighting of query terms using a genetic algorithm. Journal of Documentation, 52(4), 405-420.
47 George G. Robertson, Parallel implementation of genetic algorithms in a classifier system, Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, p.140-147, October 1987, Cambridge, Massachusetts, USA http://dl.acm.org/citation.cfm?id=42531&CFID=558819604&CFTOKEN=68186175
48 Gerard Salton , Chris Buckley, Parallel text search methods, Communications of the ACM, v.31 n.2, p.202-215, Feb. 1988 http://doi.acm.org/10.1145/42372.42380
49 Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288-297.
50 Gerard Salton , Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, NY, 1986 http://dl.acm.org/citation.cfm?id=576628&CFID=558819604&CFTOKEN=68186175
51 Sanchez, E. (1994). Fuzzy logic and genetic algorithms in information retrieval. In Proceedings of the third international conference on fuzzy logic, neural networks and soft computing (pp. 29-35).
52 Sanchez, E., Miyano, H., & Brachet, J. (1995). Optimization of fuzzy queries with genetic algorithms. Application to a data base of patents in biomedical engineering. VI IFSA Congress, Sao Paulo, Brazil (Vol. II, pp. 293-296).
53 Sanchez, E., & Pierre, P. (1994). Fuzzy logic and genetic algorithms in information retrieval. In Third international conference on fuzzy logic, neural networks and soft computing, Izuaka, Japan (pp. 29-35).
54 Smith, M. P., & Smith, M. (1997). The use of genetic programming to build boolean queries for text retrieval through relevance feedback. Journal of Information Science, 23(6), 423-431.
55 Vrajitoru, D. (1997). Apprentissage en recherche d'informations. Doctoral thesis, Faculty of Science, University of Neuchâtel.
56 Dana Vrajitoru, Crossover improvement for the genetic algorithm in information retrieval, Information Processing and Management: an International Journal, v.34 n.4, p.405-415, July 1, 1998 http://dx.doi.org/10.1016/S0306-4573(98)00015-6
57 Yang, J. J., & Korfhage, R. R. (1992). Adaptive information retrieval systems in vector model. In Proceedings of the symposium on document analysis and information retrieval. Las Vegas, Nevada (pp. 134-150).
58 Jing-Jye Yang , Robert Korfhage, Query Optimization in Information Retrieval Using Genetic Algorithms, Proceedings of the 5th International Conference on Genetic Algorithms, p.603-613, June 01, 1993 http://dl.acm.org/citation.cfm?id=657582&CFID=558819604&CFTOKEN=68186175
59 Jing-Jye Yang , Robert R. Korfhage, Query modification using genetic algorithms in vector space models, International Journal of Expert Systems, v.7 n.2, p.165-191, 1994 http://dl.acm.org/citation.cfm?id=185123&CFID=558819604&CFTOKEN=68186175
Links
Full Text