User-based document clustering by redescribing subject description with a genetic algorithm

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Referenz

M. Gordon: User-based document clustering by redescribing subject description with a genetic algorithm. Journal of the American Society for Information Science, 42 (5) (1991), pp. 311–322

DOI

http://dx.doi.org/10.1002/(SICI)1097-4571(199106)42:5<311::AID-ASI1>3.0.CO;2-J

Abstract

Information retrieval systems have used clustering of documents and queries to improve both retrieval efficiency and retrieval effectiveness. Normally, clustering involves grouping together static descriptions of documents by their similarity to each other, though user-based clustering suggests that usage patterns concerning co-relevance can form a basis for clustering. This article reports that clusters of co-relevant documents obtain increasingly similar descriptions when a genetic algorithm is used to adapt subject descriptions so that documents become more effective in matching relevant queries and failing to match nonrelevant queries. As a result of the increased similarity, clustering algorithms can more accurately group documents into useful clusters. The findings of this work were reached through simulation experiments.

Extended Abstract

Bibtex

@article {ASI:ASI1,
author = {Gordon, Michael D.},
title = {User-based document clustering by redescribing subject descriptions with a genetic algorithm},
journal = {Journal of the American Society for Information Science},
volume = {42},
number = {5},
publisher = {Wiley Subscription Services, Inc., A Wiley Company},
issn = {1097-4571},
url = {http://dx.doi.org/10.1002/(SICI)1097-4571(199106)42:5<311::AID-ASI1>3.0.CO;2-J http://de.evo-art.org/index.php?title=User-based_document_clustering_by_redescribing_subject_description_with_a_genetic_algorithm},
doi = {10.1002/(SICI)1097-4571(199106)42:5<311::AID-ASI1>3.0.CO;2-J},
pages = {311--322},
year = {1991},
}

Used References

Blair, D. C. (1986). Indeterminacy in the subject access to documents. information Processing and Management, 22, 229-241.

Bookstein, A. (1986). Performance of selt-taught documents. In Conference on research and development in information retrieval, (pp. 244-248). Pisa, Italy.

Cooper, M.D. (1973). A simulation model of an information retrieval system. Information Storage and Retrieval, 9, 13-22.

Croft, W. B. (1980). A model of cluster searching based on classification. Information Systems, 5, 189-195.


c&p problem?

Everitt, B. (1980). Cluster analysis, Second edition. New York: Halsted Press.

Salton, G. (1968). Automatic information organization and retrieval. New York: McGraw-Hill.

Gordon, M. (1985). A learning algorithm applied to document description. Proceedings of the eighth annual international ACM SIGIR conference on research and development in information retrieval (pp. 179-185). Montreal.

Gordon, M. (1988). Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31, 1208-1218.

Gordon, M. (1990). Evaluating the effectiveness of information retrieval systems using simulated queries. Journal of the American Society for Information Science, 41, 313-323.

Griffiths, J. M. (1978). The computer simulation of information retrieval systems, Ph.D. Thesis, University College London.

Holland, J. (1975). Adaptation in natural and artificial systems. Ann Arbor, Mi.: University of Michigan Press.

Jardiee, N., & Van Rijsbergen, C. J. (1971). The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, i: 217-240.

Tague, J., Nelson, M., & Wu, H. (1981a). Problems in the simulation of bibliographic retrieval systems. In Information Retrieval Research, R. N. Oddy, S. E. Robertson, C. J. Van Rijsbergen, and P.W. Williams (Eds.). London: Butterworth.

Tague, J. M., & Nelson, M. J. (1981b). Simulation of user judgments in bibliographic retrieval systems. In ACMSZGIR forum, proceedings of the fourth international conference on information storage and retrieval (pp. 66-71). Oakland, California.

Tague, J., & Nelson, M. (1983). Simulation of bibliographic retrieval databases using hyperterms. In Research and Development in Information Retrieval, Proceedings. Gerard Salton and Hans- Jochen Schneider, (Eds.), (pp. 194-207). Berlin: Springer-Verlag.

Tague, J., McClellan, C., & Nelson, M. (1984). The hyperterm model of a bibliographic database. Canadian Journal of Znformation Science, 9, 37-58.

Nelson, M. J., and Tague, J. M. (1985). Split size-rank models for the distribution of index terms. Journal of the American Society for Information Science, 36, 283-296.

Nelson, M. (1988). Correlation of term usage and term indexing frequencies. Information Processing and Management, 24, 541-547.

Raghavan, V., & Birchard, K. (1979). A clustering strategy based on a formalism of the reproductive process in natural systems. In Proceedings of the second international ACM SIGIR conference on information retrieval (pp. 10-22). Dallas.

Raghavan, V., & Deogun, J.S. (1986). User-oriented document clustering: A framework for learning in information retrieval. In Conference on research and development in information retrieval, (pp. 157-163). Piss, Italy.

Tversky, A. (1977). Features of relevance. Psychological Review, 84, 327-352.

Van Rijsbergen, C. J. (1979). Information retrieval, Second edition. London: Butterworths.

Voorhees, E. (1985, June). The cluster hypothesis revisited. In Proceedings of the eighth annual international ACM SIGIR conference on research and development in information retrieval (pp. 188- 196). Montreal.

Raghavan, V.V., & Agarwal, B. (1987, July). Optimal determination of user-oriented clusters: An application for the reproductive plan. Second international conference on genetic algorithms and their applications, (pp. 241-246). Cambridge, MA.

Yu, C.T. (1974). A clustering algorithm based on user queries. Journal of the American Society for Information Science, 25, 218-226.

Yu, C.T., Wang, Y.T., & Chen, C. H. (1985, June). Adaptive document clustering. In Proceedings of the eighth annual international ACM SIGIR conference on research and development in information retrieval (pp. 197-203). Montreal.

Zunde, P., & Dexter, M. (1969). Indexing consistency and quality. American Documentation, 20, 259-264.


Links

Full Text

internal file


Sonstige Links