Rating Image Aesthetics Using Deep Learning

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Lu, X.; Lin, Z.; Jin, H.; Yang, J.; Wang, J.Z.: Rating Image Aesthetics Using Deep Learning. IEEE Transactions on Multimedia, 2015, Volume: 17, Issue: 11, 2021 - 2034.




This paper investigates unified feature learning and classifier training approaches for image aesthetics assessment . Existing methods built upon handcrafted or generic image features and developed machine learning and statistical modeling techniques utilizing training examples. We adopt a novel deep neural network approach to allow unified feature learning and classifier training to estimate image aesthetics. In particular, we develop a double-column deep convolutional neural network to support heterogeneous inputs, i.e., global and local views, in order to capture both global and local characteristics of images . In addition, we employ the style and semantic attributes of images to further boost the aesthetics categorization performance . Experimental results show that our approach produces significantly better results than the earlier reported results on the AVA dataset for both the generic image aesthetics and content -based image aesthetics. Moreover, we introduce a 1.5-million image dataset (IAD) for image aesthetics assessment and we further boost the performance on the AVA test set by training the proposed deep neural networks on the IAD dataset.

Extended Abstract


author={Lu, X. and Lin, Z. and Jin, H. and Yang, J. and Wang, J.Z.},
journal={Multimedia, IEEE Transactions on},
title={Rating Image Aesthetics Using Deep Learning},
keywords={Computer architecture;Image color analysis;Machine learning;Neural networks;Semantics;Training;Visualization;Automatic feature learning;deep neural networks;image aesthetics},
url={http://dx.doi.org/10.1109/TMM.2015.2477040, http://de.evo-art.org/index.php?title=Rating_Image_Aesthetics_Using_Deep_Learning },

Used References

[1] R. Datta, D. Joshi, J. Li, and J. Wang, “Studying aesthetics in photo- graphic images using a computational approach,” in European Confer- ence on Computer Vision (ECCV), pp. 288–301, 2006.

[2] Y. Ke, X. Tang, and F. Jing, “The design of high-level features for photo quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 419–426, 2006.

[3] Y. Luo and X. Tang, “Photo and video quality evaluation: Focusing on the subject,” in European Conference on Computer Vision (ECCV), pp. 386–399, 2008. http://dx.doi.org/10.1007/978-3-540-88690-7_29

[4] S. Bhattacharya, R. Sukthankar, and M. Shah, “A framework for photo- quality assessment and enhancement based on visual aesthetics,” in ACM International Conference on Multimedia (MM), pp. 271–280, 2010. http://dx.doi.org/10.1145/1873951.1873990

[5] W. Luo, X. Wang, and X. Tang, “Content-based photo quality assess- ment,” in IEEE International Conference on Computer Vision (ICCV), pp. 2206–2213, 2011.

[6] S. Dhar, V. Ordonez, and T. Berg, “High level describable attributes for predicting aesthetics and interestingness,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1657–1664, 2011. http://dx.doi.org/10.1109/CVPR.2011.5995467

[7] M. Nishiyama, T. Okabe, I. Sato, and Y. Sato, “Aesthetic quality clas- sification of photographs based on color harmony,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40, 2011. http://dx.doi.org/10.1109/CVPR.2011.5995539

[8] P. O’Donovan, A. Agarwala, and A. Hertzmann, “Color compatibility from large datasets,” ACM Transactions on Graphics (TOG), vol. 30, no. 4, pp. 63:1–12, 2011. http://dx.doi.org/10.1145/1964921.1964958

[9] L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka, “Assessing the aesthetic quality of photographs using generic image descriptors,” in IEEE International Conference on Computer Vision (ICCV), pp. 1784– 1791, 2011. http://dx.doi.org/10.1109/ICCV.2011.6126444

[10] N. Murray, L. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2408–2415, 2012. http://dx.doi.org/10.1109/CVPR.2012.6247954

[11] L. Marchesotti and F. Perronnin, “Learning beautiful (and ugly) at- tributes,” in British Machine Vision Conference (BMVC), 2013.

[12] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91–110, 2004. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Infor- mation Processing Systems (NIPS), pp. 1106–1114, 2012.

[14] H.-H. Su, T.-W. Chen, C.-C. Kao, W. Hsu, and S.-Y. Chien, “Scenic photo quality assessment with bag of aesthetics-preserving features,” in ACM International Conference on Multimedia (MM), pp. 1213–1216, 2011. http://dx.doi.org/10.1145/2072298.2071977

[15] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” International Journal of Computer Vision (IJCV), vol. 42, no. 3, pp. 145–175, 2001. http://dx.doi.org/10.1023/A:1011139631724

[16] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649, 2012. http://dx.doi.org/10.1109/CVPR.2012.6248110

[17] Y. Sun, X. Wang, and X. Tang, “Hybrid deep learning for face ver- ification,” in The IEEE International Conference on Computer Vision (ICCV), 2013. http://dx.doi.org/10.1109/ICCV.2013.188

[18] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection with unsupervised multi-stage features learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633, 2013. http://dx.doi.org/10.1109/CVPR.2013.465

[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. http://dx.doi.org/10.1109/5.726791

[20] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. http://dx.doi.org/10.1162/neco.2006.18.7.1527

[21] G. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002. http://dx.doi.org/10.1162/089976602760128018

[22] S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala, and T. Dar- rel, “Recognizing image style,” in British Machine Vision Conference (BMVC), 2014. http://dx.doi.org/10.5244/C.28.122

[23] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Technical report, 2013. arXiv:1310.1531v1, 2013.

[24] F. Agostinelli, M. Anderson, and H. Lee, “Adaptive multi-column deep neural networks with application to robust image denoising,” in Advances in Neural Information Processing Systems (NIPS), pp. 1493– 1501, 2013.

[25] A. Khosla, A. Das Sarma, and R. Hamid, “What makes an image popular?” in International World Wide Web Conference (WWW), pp. 867–876, 2014. http://dx.doi.org/10.1145/2566486.2567996

[26] O. Litzel, in On Photographic Composition. New York: Amphoto Books, 1974.

[27] W. Niekamp, “An exploratory investigation into factors affecting visual balance,” in Educational Communication and Technology: A Journal of Theory, Research, and Development, vol. 29, no. 1, pp. 37–48, 1981.

[28] R. Arnheim, in Art and visual Perception: A psychology of the creative eye. Los Angeles. CA: University of California Press., 1974.

[29] D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li, and J. B. Luo, “Aesthetics and emotions in images,” IEEE Signal Processing Magazine, vol. 28, no. 5, pp. 94–115, 2011. http://dx.doi.org/10.1109/MSP.2011.941851

[30] J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 22, no. 10, pp. 1345– 1359, 2010. http://dx.doi.org/10.1109/TKDE.2009.191

[31] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Interna- tional Conference on Machine Learning (ICML), pp. 160–167, 2008. http://dx.doi.org/10.1145/1390156.1390177

[32] X. Lu, P. Suryanarayan, R. B. Adams Jr, J. Li, M. G. Newman, and J. Z. Wang, “On shape and the computability of emotions,” in ACM International Conference on Multimedia (MM), pp. 229–238, 2012 http://dx.doi.org/10.1145/2393347.2393384


Full Text


intern file

Sonstige Links