Rating Image Aesthetics Using Deep Learning: Unterschied zwischen den Versionen
(→Used References) |
|||
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt) | |||
Zeile 27: | Zeile 27: | ||
== Used References == | == Used References == | ||
− | R. Datta, D. Joshi, J. Li and J. Wang, | + | [1] R. Datta, D. Joshi, J. Li, and J. Wang, “Studying aesthetics in photo- |
+ | graphic images using a computational approach,” in European Confer- | ||
+ | ence on Computer Vision (ECCV), pp. 288–301, 2006. | ||
− | Y. Ke, X. Tang and F. Jing, | + | [2] Y. Ke, X. Tang, and F. Jing, “The design of high-level features for |
+ | photo quality assessment,” in IEEE Conference on Computer Vision and | ||
+ | Pattern Recognition (CVPR), vol. 1, pp. 419–426, 2006. | ||
− | Y. Luo and X. Tang, | + | [3] Y. Luo and X. Tang, “Photo and video quality evaluation: Focusing on |
+ | the subject,” in European Conference on Computer Vision (ECCV), pp. | ||
+ | 386–399, 2008. http://dx.doi.org/10.1007/978-3-540-88690-7_29 | ||
− | S. Bhattacharya, R. Sukthankar and M. Shah, | + | [4] S. Bhattacharya, R. Sukthankar, and M. Shah, “A framework for photo- |
+ | quality assessment and enhancement based on visual aesthetics,” in ACM | ||
+ | International Conference on Multimedia (MM), pp. 271–280, 2010. http://dx.doi.org/10.1145/1873951.1873990 | ||
− | W. Luo, X. Wang and X. Tang, | + | [5] W. Luo, X. Wang, and X. Tang, “Content-based photo quality assess- |
+ | ment,” in IEEE International Conference on Computer Vision (ICCV), | ||
+ | pp. 2206–2213, 2011. | ||
− | S. Dhar, V. Ordonez and T. Berg, | + | [6] S. Dhar, V. Ordonez, and T. Berg, “High level describable attributes |
− | + | for predicting aesthetics and interestingness,” in IEEE Conference on | |
+ | Computer Vision and Pattern Recognition (CVPR), pp. 1657–1664, | ||
+ | 2011. http://dx.doi.org/10.1109/CVPR.2011.5995467 | ||
− | M. Nishiyama, T. Okabe, I. Sato and Y. Sato, | + | [7] M. Nishiyama, T. Okabe, I. Sato, and Y. Sato, “Aesthetic quality clas- |
− | + | sification of photographs based on color harmony,” in IEEE Conference | |
+ | on Computer Vision and Pattern Recognition (CVPR), pp. 33–40, 2011. http://dx.doi.org/10.1109/CVPR.2011.5995539 | ||
− | P. | + | [8] P. O’Donovan, A. Agarwala, and A. Hertzmann, “Color compatibility |
+ | from large datasets,” ACM Transactions on Graphics (TOG), vol. 30, | ||
+ | no. 4, pp. 63:1–12, 2011. http://dx.doi.org/10.1145/1964921.1964958 | ||
− | L. Marchesotti, F. Perronnin, D. Larlus and G. Csurka, | + | [9] L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka, “Assessing the |
− | + | aesthetic quality of photographs using generic image descriptors,” in | |
+ | IEEE International Conference on Computer Vision (ICCV), pp. 1784– | ||
+ | 1791, 2011. http://dx.doi.org/10.1109/ICCV.2011.6126444 | ||
− | N. Murray, L. Marchesotti and F. Perronnin, | + | [10] N. Murray, L. Marchesotti, and F. Perronnin, “AVA: A large-scale |
− | + | database for aesthetic visual analysis,” in IEEE Conference on Computer | |
+ | Vision and Pattern Recognition (CVPR), pp. 2408–2415, 2012. http://dx.doi.org/10.1109/CVPR.2012.6247954 | ||
− | L. Marchesotti and F. Perronnin, | + | [11] L. Marchesotti and F. Perronnin, “Learning beautiful (and ugly) at- |
+ | tributes,” in British Machine Vision Conference (BMVC), 2013. | ||
− | D. Lowe, | + | [12] D. Lowe, “Distinctive image features from scale-invariant keypoints,” |
+ | International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. | ||
+ | 91–110, 2004. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 | ||
− | A. Krizhevsky, I. Sutskever and G. E. Hinton, | + | [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification |
+ | with deep convolutional neural networks,” in Advances in Neural Infor- | ||
+ | mation Processing Systems (NIPS), pp. 1106–1114, 2012. | ||
− | H.-H. Su, T.-W. Chen, C.-C. Kao, W. Hsu and S.-Y. Chien, | + | [14] H.-H. Su, T.-W. Chen, C.-C. Kao, W. Hsu, and S.-Y. Chien, “Scenic |
+ | photo quality assessment with bag of aesthetics-preserving features,” in | ||
+ | ACM International Conference on Multimedia (MM), pp. 1213–1216, | ||
+ | 2011. http://dx.doi.org/10.1145/2072298.2071977 | ||
− | A. Oliva and A. Torralba, | + | [15] A. Oliva and A. Torralba, “Modeling the shape of the scene: A |
+ | holistic representation of the spatial envelope,” International Journal | ||
+ | of Computer Vision (IJCV), vol. 42, no. 3, pp. 145–175, 2001. http://dx.doi.org/10.1023/A:1011139631724 | ||
− | D. Ciresan, U. Meier and J. Schmidhuber, | + | [16] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural |
− | + | networks for image classification,” in IEEE Conference on Computer | |
+ | Vision and Pattern Recognition (CVPR), pp. 3642–3649, 2012. http://dx.doi.org/10.1109/CVPR.2012.6248110 | ||
− | Y. Sun, X. Wang and X. Tang, | + | [17] Y. Sun, X. Wang, and X. Tang, “Hybrid deep learning for face ver- |
− | + | ification,” in The IEEE International Conference on Computer Vision | |
+ | (ICCV), 2013. http://dx.doi.org/10.1109/ICCV.2013.188 | ||
− | P. Sermanet, K. Kavukcuoglu, S. Chintala and Y. LeCun, | + | [18] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian |
− | + | detection with unsupervised multi-stage features learning,” in IEEE | |
+ | Conference on Computer Vision and Pattern Recognition (CVPR), pp. | ||
+ | 3626–3633, 2013. http://dx.doi.org/10.1109/CVPR.2013.465 | ||
− | Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, | + | [19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning |
− | + | applied to document recognition,” Proceedings of the IEEE, vol. 86, | |
+ | no. 11, pp. 2278–2324, 1998. http://dx.doi.org/10.1109/5.726791 | ||
− | G. E. Hinton, S. Osindero and Y.-W. Teh, | + | [20] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for |
+ | deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, | ||
+ | 2006. http://dx.doi.org/10.1162/neco.2006.18.7.1527 | ||
− | G. Hinton, | + | [21] G. Hinton, “Training products of experts by minimizing contrastive |
+ | divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002. http://dx.doi.org/10.1162/089976602760128018 | ||
− | S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala and T. | + | [22] S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala, and T. Dar- |
+ | rel, “Recognizing image style,” in British Machine Vision Conference | ||
+ | (BMVC), 2014. http://dx.doi.org/10.5244/C.28.122 | ||
− | J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng and T. Darrell, | + | [23] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and |
+ | T. Darrell, “DeCAF: A deep convolutional activation feature for generic | ||
+ | visual recognition,” in Technical report, 2013. arXiv:1310.1531v1, 2013. | ||
− | F. Agostinelli, M. Anderson and H. Lee, | + | [24] F. Agostinelli, M. Anderson, and H. Lee, “Adaptive multi-column |
+ | deep neural networks with application to robust image denoising,” in | ||
+ | Advances in Neural Information Processing Systems (NIPS), pp. 1493– | ||
+ | 1501, 2013. | ||
− | A. Khosla, A. Das Sarma and R. Hamid, | + | [25] A. Khosla, A. Das Sarma, and R. Hamid, “What makes an image |
+ | popular?” in International World Wide Web Conference (WWW), pp. | ||
+ | 867–876, 2014. http://dx.doi.org/10.1145/2566486.2567996 | ||
− | O. Litzel, On Photographic Composition, 1974 | + | [26] O. Litzel, in On Photographic Composition. New York: Amphoto Books, 1974. |
− | W. Niekamp, | + | [27] W. Niekamp, “An exploratory investigation into factors affecting visual |
+ | balance,” in Educational Communication and Technology: A Journal of | ||
+ | Theory, Research, and Development, vol. 29, no. 1, pp. 37–48, 1981. | ||
− | R. Arnheim, Art and | + | [28] R. Arnheim, in Art and visual Perception: A psychology of the creative |
+ | eye. Los Angeles. CA: University of California Press., 1974. | ||
− | D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li and J. B. Luo, | + | [29] D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li, and |
− | + | J. B. Luo, “Aesthetics and emotions in images,” IEEE Signal Processing | |
+ | Magazine, vol. 28, no. 5, pp. 94–115, 2011. http://dx.doi.org/10.1109/MSP.2011.941851 | ||
− | J. Pan and Q. Yang, | + | [30] J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions |
− | + | on Knowledge and Data Engineering (TKDE), vol. 22, no. 10, pp. 1345– | |
+ | 1359, 2010. http://dx.doi.org/10.1109/TKDE.2009.191 | ||
− | R. Collobert and J. Weston, | + | [31] R. Collobert and J. Weston, “A unified architecture for natural language |
+ | processing: Deep neural networks with multitask learning,” in Interna- | ||
+ | tional Conference on Machine Learning (ICML), pp. 160–167, 2008. http://dx.doi.org/10.1145/1390156.1390177 | ||
− | X. Lu, P. Suryanarayan, R. B. Adams, J. Li, M. G. Newman and J. Z. Wang, | + | [32] X. Lu, P. Suryanarayan, R. B. Adams Jr, J. Li, M. G. Newman, and |
+ | J. Z. Wang, “On shape and the computability of emotions,” in ACM | ||
+ | International Conference on Multimedia (MM), pp. 229–238, 2012 http://dx.doi.org/10.1145/2393347.2393384 | ||
== Links == | == Links == |
Aktuelle Version vom 20. Juni 2016, 17:16 Uhr
Inhaltsverzeichnis
Reference
Lu, X.; Lin, Z.; Jin, H.; Yang, J.; Wang, J.Z.: Rating Image Aesthetics Using Deep Learning. IEEE Transactions on Multimedia, 2015, Volume: 17, Issue: 11, 2021 - 2034.
DOI
http://dx.doi.org/10.1109/TMM.2015.2477040
Abstract
This paper investigates unified feature learning and classifier training approaches for image aesthetics assessment . Existing methods built upon handcrafted or generic image features and developed machine learning and statistical modeling techniques utilizing training examples. We adopt a novel deep neural network approach to allow unified feature learning and classifier training to estimate image aesthetics. In particular, we develop a double-column deep convolutional neural network to support heterogeneous inputs, i.e., global and local views, in order to capture both global and local characteristics of images . In addition, we employ the style and semantic attributes of images to further boost the aesthetics categorization performance . Experimental results show that our approach produces significantly better results than the earlier reported results on the AVA dataset for both the generic image aesthetics and content -based image aesthetics. Moreover, we introduce a 1.5-million image dataset (IAD) for image aesthetics assessment and we further boost the performance on the AVA test set by training the proposed deep neural networks on the IAD dataset.
Extended Abstract
Bibtex
@ARTICLE{7243357, author={Lu, X. and Lin, Z. and Jin, H. and Yang, J. and Wang, J.Z.}, journal={Multimedia, IEEE Transactions on}, title={Rating Image Aesthetics Using Deep Learning}, year={2015}, volume={17}, number={11}, pages={2021-2034}, keywords={Computer architecture;Image color analysis;Machine learning;Neural networks;Semantics;Training;Visualization;Automatic feature learning;deep neural networks;image aesthetics}, doi={10.1109/TMM.2015.2477040}, url={http://dx.doi.org/10.1109/TMM.2015.2477040, http://de.evo-art.org/index.php?title=Rating_Image_Aesthetics_Using_Deep_Learning }, ISSN={1520-9210}, month={Nov}, }
Used References
[1] R. Datta, D. Joshi, J. Li, and J. Wang, “Studying aesthetics in photo- graphic images using a computational approach,” in European Confer- ence on Computer Vision (ECCV), pp. 288–301, 2006.
[2] Y. Ke, X. Tang, and F. Jing, “The design of high-level features for photo quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 419–426, 2006.
[3] Y. Luo and X. Tang, “Photo and video quality evaluation: Focusing on the subject,” in European Conference on Computer Vision (ECCV), pp. 386–399, 2008. http://dx.doi.org/10.1007/978-3-540-88690-7_29
[4] S. Bhattacharya, R. Sukthankar, and M. Shah, “A framework for photo- quality assessment and enhancement based on visual aesthetics,” in ACM International Conference on Multimedia (MM), pp. 271–280, 2010. http://dx.doi.org/10.1145/1873951.1873990
[5] W. Luo, X. Wang, and X. Tang, “Content-based photo quality assess- ment,” in IEEE International Conference on Computer Vision (ICCV), pp. 2206–2213, 2011.
[6] S. Dhar, V. Ordonez, and T. Berg, “High level describable attributes for predicting aesthetics and interestingness,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1657–1664, 2011. http://dx.doi.org/10.1109/CVPR.2011.5995467
[7] M. Nishiyama, T. Okabe, I. Sato, and Y. Sato, “Aesthetic quality clas- sification of photographs based on color harmony,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40, 2011. http://dx.doi.org/10.1109/CVPR.2011.5995539
[8] P. O’Donovan, A. Agarwala, and A. Hertzmann, “Color compatibility from large datasets,” ACM Transactions on Graphics (TOG), vol. 30, no. 4, pp. 63:1–12, 2011. http://dx.doi.org/10.1145/1964921.1964958
[9] L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka, “Assessing the aesthetic quality of photographs using generic image descriptors,” in IEEE International Conference on Computer Vision (ICCV), pp. 1784– 1791, 2011. http://dx.doi.org/10.1109/ICCV.2011.6126444
[10] N. Murray, L. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2408–2415, 2012. http://dx.doi.org/10.1109/CVPR.2012.6247954
[11] L. Marchesotti and F. Perronnin, “Learning beautiful (and ugly) at- tributes,” in British Machine Vision Conference (BMVC), 2013.
[12] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91–110, 2004. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Infor- mation Processing Systems (NIPS), pp. 1106–1114, 2012.
[14] H.-H. Su, T.-W. Chen, C.-C. Kao, W. Hsu, and S.-Y. Chien, “Scenic photo quality assessment with bag of aesthetics-preserving features,” in ACM International Conference on Multimedia (MM), pp. 1213–1216, 2011. http://dx.doi.org/10.1145/2072298.2071977
[15] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” International Journal of Computer Vision (IJCV), vol. 42, no. 3, pp. 145–175, 2001. http://dx.doi.org/10.1023/A:1011139631724
[16] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649, 2012. http://dx.doi.org/10.1109/CVPR.2012.6248110
[17] Y. Sun, X. Wang, and X. Tang, “Hybrid deep learning for face ver- ification,” in The IEEE International Conference on Computer Vision (ICCV), 2013. http://dx.doi.org/10.1109/ICCV.2013.188
[18] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection with unsupervised multi-stage features learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633, 2013. http://dx.doi.org/10.1109/CVPR.2013.465
[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. http://dx.doi.org/10.1109/5.726791
[20] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. http://dx.doi.org/10.1162/neco.2006.18.7.1527
[21] G. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002. http://dx.doi.org/10.1162/089976602760128018
[22] S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala, and T. Dar- rel, “Recognizing image style,” in British Machine Vision Conference (BMVC), 2014. http://dx.doi.org/10.5244/C.28.122
[23] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Technical report, 2013. arXiv:1310.1531v1, 2013.
[24] F. Agostinelli, M. Anderson, and H. Lee, “Adaptive multi-column deep neural networks with application to robust image denoising,” in Advances in Neural Information Processing Systems (NIPS), pp. 1493– 1501, 2013.
[25] A. Khosla, A. Das Sarma, and R. Hamid, “What makes an image popular?” in International World Wide Web Conference (WWW), pp. 867–876, 2014. http://dx.doi.org/10.1145/2566486.2567996
[26] O. Litzel, in On Photographic Composition. New York: Amphoto Books, 1974.
[27] W. Niekamp, “An exploratory investigation into factors affecting visual balance,” in Educational Communication and Technology: A Journal of Theory, Research, and Development, vol. 29, no. 1, pp. 37–48, 1981.
[28] R. Arnheim, in Art and visual Perception: A psychology of the creative eye. Los Angeles. CA: University of California Press., 1974.
[29] D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li, and J. B. Luo, “Aesthetics and emotions in images,” IEEE Signal Processing Magazine, vol. 28, no. 5, pp. 94–115, 2011. http://dx.doi.org/10.1109/MSP.2011.941851
[30] J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 22, no. 10, pp. 1345– 1359, 2010. http://dx.doi.org/10.1109/TKDE.2009.191
[31] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Interna- tional Conference on Machine Learning (ICML), pp. 160–167, 2008. http://dx.doi.org/10.1145/1390156.1390177
[32] X. Lu, P. Suryanarayan, R. B. Adams Jr, J. Li, M. G. Newman, and J. Z. Wang, “On shape and the computability of emotions,” in ACM International Conference on Multimedia (MM), pp. 229–238, 2012 http://dx.doi.org/10.1145/2393347.2393384
Links
Full Text
http://infolab.stanford.edu/~wangz/project/imsearch/Aesthetics/TMM15/lu.pdf