Deep multi-patch aggregation network for image style, aesthetics, and quality estimation

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, James Z Wang: Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 990-998.



This paper investigates problems of image style, aesthetics, and quality estimation, which require fine-grained details from high-resolution images, utilizing deep neural network training approach. Existing deep convolutional neural networks mostly extracted one patch such as a down-sized crop from each image as a training example. However, one patch may not always well represent the entire image, which may cause ambiguity during training. We propose a deep multi-patch aggregation network training approach, which allows us to train models using multiple patches generated from one image. We achieve this by constructing multiple, shared columns in the neural network and feeding multiple patches to each of the columns. More importantly, we propose two novel network layers (statistics and sorting) to support aggregation of those patches. The proposed deep multi-patch aggregation network integrates shared feature learning and aggregation function learning into a unified framework. We demonstrate the effectiveness of the deep multi-patch aggregation network on the three problems, i.e., image style recognition, aesthetic quality categorization, and image quality estimation. Our models trained using the proposed networks significantly outperformed the state of the art in all three applications.

Extended Abstract


author = {Lu, Xin and Lin, Zhe and Shen, Xiaohui and Mech, Radomir and Wang, James Z.},
title = {Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015},
url={,_aesthetics,_and_quality_estimation },

Used References

[1] F. Agostinelli, M. Anderson, and H. Lee. Adaptive multicolumn deep neural networks with application to robust image denoising. In NIPS, pages 1493–1501. 2013.

[2] B. Babenko, N. Varma, P. Dollr, and S. Belongie. Multiple instance learning with manifold bags. In ICML, pages 81–88, 2011.

[3] L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In ICCV, pages 1543–1550, 2011.

[4] L. Bourdev, F. Yang, and R. Fergus. Deep poselets for human detection. In arXiv:1407.0717v1, 2014.

[5] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In CVPR, pages 3642–3649, 2012.

[6] T. Dietterich, R. H. Lathrop, and T. Lozano-Prez. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1-2):31–71, 1997.

[7] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. TPAMI, 32(9):1627–1645, 2010.

[8] Z. Fu, A. Robles-Kelly, and J. Zhou. MILIS: Multiple instance learning with instance selection. TPAMI, 33(5):958– 977, 2011.

[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580–587, 2014.

[10] Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In ECCV, pages 392–407, 2014.

[11] T. Grtner, P. Flach, A. Kowalczyk, and A. Smola. Multiinstance kernels. In ICML, pages 179–186, 2002.

[12] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, pages 346–361, 2014.

[13] D. Heckerman. A tractable inference algorithm for diagnosing multiple diseases. In UAI, pages 163–171, 1989.

[14] J. Hoffman, D. Pathak, T. Darrell, and K. Saenko. Detector discovery in the wild: Joint multiple instance and representation learning. In arXiv:1412.1135v1, 2014.

[15] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. DenseNet: Implementing efficient Convnet descriptor pyramids. Technical report, University of California, Berkeley, arXiv:1404.1869v1, 2014.

[16] M. Juneja, A. Vedaldi, C. Jawahar, and A. Zisserman. Blocks that shout: Distinctive parts for scene classification. In CVPR, pages 923–930, 2013.

[17] L. Kang, P. Ye, Y. Li, and D. Doermann. Convolutional neural networks for no-reference image quality assessment. In CVPR, pages 1733–1740, 2014. [18] S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala, and T. Darrel. Recognizing image style. In BMVC, 2014.

[19] J. Keeler, D. Rumelhart, and W. Leow. Integrated segmentation and recognition of hand-printed numerals. In NIPS, pages 557–563. 1991.

[20] M. Koskela and J. Laaksonen. Convolutional network features for scene recognition. In ACM MM, pages 1169–1172, 2014.

[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012.

[22] S. Li, Z. Q. Liu, and A. B. Chan. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In arXiv:1406.3474v1, 2014.

[23] Z. Lin, G. Hua, and L. Davis. Multiple instance feature for robust part-based object detection. In CVPR, pages 405–412, 2009.

[24] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Wang. RAPID: Rating pictorial aesthetics using deep learning. In ACM MM, pages 457–466, 2014.

[25] O. Maron and T. Lozano-Prez. A framework for multipleinstance learning. In NIPS, pages 570–576. 1998.

[26] N. Murray, L. Marchesotti, and F. Perronnin. AVA: A largescale database for aesthetic visual analysis. In CVPR, pages 2408–2415, 2012.

[27] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Weakly supervised object recognition with convolutional neural networks. Technical Report HAL-01015140, INRIA, 2014.

[28] G. Papandreou, I. Kokkinos, and P. Savalle. Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection. In CVPR, 2015.

[29] J. Sun and J. Ponce. Learning discriminative part detectors for image classification and cosegmentation. In ICCV, pages 3400–3407, 2013.

[30] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. DeepFace: Closing the gap to human-level performance in face verification. In CVPR, pages 1701–1708, 2014.

[31] R. R. Vatsavai. Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In SIGKDD, pages 1419–1426, 2013.

[32] Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. CNN: Single-label to multi-label. In arXiv:1406.5726v3, 2014.

[33] N. Weidmann, E. Frank, and B. Pfahringer. A two-level learning method for generalized multi-instance problems. In ECML, pages 468–479, 2003.

[34] J. Wu, Y. Yu, C. Huang, and K. Yu. Deep multiple instance learning for image classification and auto-annotation. In CVPR, 2015.

[35] Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E.-C. Chang. Deep learning of feature representation with multiple instance learning for medical image analysis. In ICASSP, 2014.

[36] C. Yang and T. Lozano-Perez. Image database retrieval with multiple-instance learning techniques. In International Con- ference on Data Engineering, pages 233–243, 2000.

[37] C. Zhang, J. Platt, and P. Viola. Multiple instance boosting for object detection. In NIPS, pages 1417–1424. 2005.

[38] N. Zhang, J. Donahue, R. Girshick, and T. Darrell. Partbased R-CNNs for fine-grained category detection. In ECCV, pages 834–849, 2014.

[39] N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev. PANDA: Pose aligned networks for deep attribute modeling. In CVPR, pages 1637–1644, 2014.

[40] Z. Zhou and M. Zhang. Multi-instatnce multi-label learning with application to scene classification. In NIPS, pages 1609–1616, 2006.


Full Text

internal file

Sonstige Links