Multi-modal Medical Image Retrieval
Inhaltsverzeichnis
Referenz
Yu Caoa, Henning Müller b, Charles E. Kahn, Jr.c, Ethan Munson: Multi-modal Medical Image Retrieval. Department of Computer Science & Engineering, University of Tennessee at Chattanooga
DOI
Abstract
Images are ubiquitous in biomedicine and the image viewers play a central role in many aspects of modern health care. Tremendous amounts of medical image data are captured and recorded in digital format during the daily clinical practice, medical research, and education (in 2009, over 117,000 images per day in the Geneva radiology department alone). Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop an effective and efficient medical image retrieval system for clinical practice and research. Traditionally, medical image retrieval systems rely on text-based retrieval techniques that use the captions associated with the images, and most often, the access is by patient ID, only. Since the 1990s, we have seen increasing interests in content-based image retrieval for medical applications. One of the promising directions in content-based medical image retrieval is to correlate multi-modal information (e.g., text and image information) to provide better insights. In this paper, we concentrate our efforts on how to retrieve the most relevant medical images using multi-modal information. Specifically, we use two modalities: the visual content of the images (represented by visual features) and the textual information associated with the images. The core idea for multi-modal retrieval is rooted in information fusion. Existing literature on multi-modal retrieval can roughly be classified into two categories: feature fusion and retrieval fusion. The feature fusion strategy generates an integrated feature representation from multiple modalities. The retrieval fusion strategy refers to the techniques that merge the retrieval results from multiple retrieval algorithms. Our proposed approach belongs to the first category (feature fusion) and is largely inspired by Pham et al. [1] and Leinhart et al.[2]. In [1], the features from different modalities are normalized and concatenated to generate the feature vectors. Then, the Latent Semantic Analysis (LSA) is applied on these features for image retrieval. In [2], Lienhart et al propose a multi-layer probability Latent Semantic Analysis (pLSA) to solve the multi-modal image retrieval problem. Our proposed approach is different from Pham et al. [1] in that we do not simply concatenate the features from different modalities. Instead, we represent the features from different modalities as a multi-dimensional matrix and incorporate these feature vectors using an extended pLSA model. Our method is also different from Lienhart et al. [2] since we use a single pLSA model instead of multiple pLSA models. The major contribution of our work is the new representation of an image using visual-textual “words”. These “words” are generated from the visual descriptors and textual information using the extended pLSA model.
Extended Abstract
Bibtex
Used References
[1] T.-T. Pham, N. E. Maillot, J.-H. Lim, and J.-P. Chevallet, "Latent semantic fusion model for image retrieval and annotation," in Proc. of the sixteenth ACM conference on Conference on information and knowledge management (CIKM), Lisbon, Portugal, 2007, pp. 439-444.
[2] R. Lienhart, S. Romberg, and E. Hörster, "Multilayer pLSA for multimodal image retrieval," in Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), Island of Santorini, Greece, 2009, pp. 1-8.
[3] J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, "Discovering object categories in image collections," in Proc. of the IEEE International Conference on Computer Vision (ICCV), Beijing, P.R.China, 2005, pp. 370- 377.
[4] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in Proc. of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, 2006, pp. 2169-2178.
[5] H. Muller, J. Kalpathy-Cramer, I. Eggel, S. Bedrick, S. ı. Radhouani, B. Bakke, C. E. K. Jr, and W. Hersh, "Overview of the CLEF 2009 medical image retrieval track," in 10th Workshop of the Cross-Language Evaluation Forum, 2009, pp. 1-11.
[6] "trec_eval: A standard tool used by the TREC community for evaluating an ad hoc retrieval run," in http://trec.nist.gov/trec_eval/. Washington DC, 2010.
Links
Full Text
http://publications.hevs.ch/index.php/attachments/single/273