Rapid: Rating pictorial aesthetics using deep learning

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, James Z. Wang: Rapid: Rating pictorial aesthetics using deep learning. In: Proceedings of the ACM International Conference on Multimedia, pp. 457–466. ACM (2014)




Effective visual features are essential for computational aesthetic quality rating systems. Existing methods used machine learning and statistical modeling techniques on handcrafted features or generic image descriptors. A recently-published large-scale dataset, the AVA dataset, has further empowered machine learning based approaches. We present the RAPID (RAting PIctorial aesthetics using Deep learning) system, which adopts a novel deep neural network approach to enable automatic feature learning. The central idea is to incorporate heterogeneous inputs generated from the image, which include a global view and a local view, and to unify the feature learning and classifier training using a double-column deep convolutional neural network. In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy. Experimental results show that our approach significantly outperforms the state of the art on the AVA dataset.

Extended Abstract


author = {Lu, Xin and Lin, Zhe and Jin, Hailin and Yang, Jianchao and Wang, James Z.},
title = {RAPID: Rating Pictorial Aesthetics Using Deep Learning},
booktitle = {Proceedings of the 22Nd ACM International Conference on Multimedia},
series = {MM '14},
year = {2014},
isbn = {978-1-4503-3063-3},
location = {Orlando, Florida, USA},
pages = {457--466},
numpages = {10},
url = {http://doi.acm.org/10.1145/2647868.2654927, http://de.evo-art.org/index.php?title=Rapid:_Rating_pictorial_aesthetics_using_deep_learning },
doi = {10.1145/2647868.2654927},
acmid = {2654927},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {deep learning, image aesthetics, multi-column deep neural networks},

Used References

1 F. Agostinelli, M. Anderson, and H. Lee. Adaptive multi-column deep neural networks with application to robust image denoising. In Advances in Neural Information Processing Systems (NIPS), pages 1493--1501. 2013.

2 R. Arnheim. In Art and visual Perception: A psychology of the creative eye. Los Angeles. CA: University of California Press., 1974.

3 Subhabrata Bhattacharya , Rahul Sukthankar , Mubarak Shah, A framework for photo-quality assessment and enhancement based on visual aesthetics, Proceedings of the international conference on Multimedia, October 25-29, 2010, Firenze, Italy http://dl.acm.org/citation.cfm?id=1873990&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/1873951.1873990

4 Jürgen Schmidhuber, Multi-column deep neural networks for image classification, Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.3642-3649, June 16-21, 2012 http://dl.acm.org/citation.cfm?id=2354694&CFID=558819604&CFTOKEN=68186175

5 Ronan Collobert , Jason Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, Proceedings of the 25th international conference on Machine learning, p.160-167, July 05-09, 2008, Helsinki, Finland http://dl.acm.org/citation.cfm?id=1390177&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/1390156.1390177

6 Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Studying aesthetics in photographic images using a computational approach, Proceedings of the 9th European conference on Computer Vision, May 07-13, 2006, Graz, Austria http://dl.acm.org/citation.cfm?id=2129588&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1007/11744078_23

7 S. Dhar , V. Ordonez , T. L. Berg, High level describable attributes for predicting aesthetics and interestingness, Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, p.1657-1664, June 20-25, 2011 http://dl.acm.org/citation.cfm?id=2191942&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/CVPR.2011.5995467

8 J. Donahue, Y. Jia, O. Vinyals, J. Ho man, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. In Technical report, 2013. arXiv:1310.1531v1, 2013.

9 Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, v.14 n.8, p.1771-1800, August 2002 http://dl.acm.org/citation.cfm?id=639730&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1162/089976602760128018

10 Geoffrey E. Hinton , Simon Osindero , Yee-Whye Teh, A fast learning algorithm for deep belief nets, Neural Computation, v.18 n.7, p.1527-1554, July 2006 http://dl.acm.org/citation.cfm?id=1161605&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1162/neco.2006.18.7.1527

11 D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li, and J. B. Luo. Aesthetics and emotions in images. In IEEE Signal Processing Magazine, 2011.

12 S. Karayev, A. Hertzmann, H. Winnermoller, A. Agarwala, and T. Darrel. Recognizing image style. In British Machine Vision Conference (BMVC), 2014.

13 Yan Ke , Xiaoou Tang , Feng Jing, The Design of High-Level Features for Photo Quality Assessment, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.419-426, June 17-22, 2006 http://dl.acm.org/citation.cfm?id=1153495&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/CVPR.2006.303

14 Aditya Khosla , Atish Das Sarma , Raffay Hamid, What makes an image popular?, Proceedings of the 23rd international conference on World wide web, April 07-11, 2014, Seoul, Korea http://dl.acm.org/citation.cfm?id=2567996&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/2566486.2567996

15 A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1106--1114, 2012.

16 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278--2324, 1998.

17 O. Litzel. In On Photographic Composition . New York: Amphoto Books, 1974.

18 David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, v.60 n.2, p.91-110, November 2004 http://dl.acm.org/citation.cfm?id=996342&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

19 Xin Lu , Poonam Suryanarayan , Reginald B. Adams, Jr. , Jia Li , Michelle G. Newman , James Z. Wang, On shape and the computability of emotions, Proceedings of the 20th ACM international conference on Multimedia, October 29-November 02, 2012, Nara, Japan http://dl.acm.org/citation.cfm?id=2393384&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/2393347.2393384

20 Wei Luo , Xiaogang Wang , Xiaoou Tang, Content-based photo quality assessment, Proceedings of the 2011 International Conference on Computer Vision, p.2206-2213, November 06-13, 2011 http://dl.acm.org/citation.cfm?id=2356469&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/ICCV.2011.6126498

21 Yiwen Luo , Xiaoou Tang, Photo and Video Quality Evaluation: Focusing on the Subject, Proceedings of the 10th European Conference on Computer Vision: Part III, October 12-18, 2008, Marseille, France http://dl.acm.org/citation.cfm?id=1478204&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1007/978-3-540-88690-7_29

22 L. Marchesotti and F. Perronnin. Learning beautiful (and ugly) attributes. In British Machine Vision Conference (BMVC), 2013.

23 Luca Marchesotti , Florent Perronnin , Diane Larlus , Gabriela Csurka, Assessing the aesthetic quality of photographs using generic image descriptors, Proceedings of the 2011 International Conference on Computer Vision, p.1784-1791, November 06-13, 2011 http://dl.acm.org/citation.cfm?id=2356439&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/ICCV.2011.6126444

24 Florent Perronnin, AVA: A large-scale database for aesthetic visual analysis, Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.2408-2415, June 16-21, 2012 http://dl.acm.org/citation.cfm?id=2354807&CFID=558819604&CFTOKEN=68186175

25 W. Niekamp. An exploratory investigation into factors affecting visual balance. In Educational Communication and Technology: A Journal of Theory, Research, and Development, volume 29, pages 37--48, 1981.

26 M. Nishiyama , T. Okabe , I. Sato , Y. Sato, Aesthetic quality classification of photographs based on color harmony, Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, p.33-40, June 20-25, 2011 http://dl.acm.org/citation.cfm?id=2191891&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/CVPR.2011.5995539

27 Peter O'Donovan , Aseem Agarwala , Aaron Hertzmann, Color compatibility from large datasets, ACM Transactions on Graphics (TOG), v.30 n.4, July 2011 http://dl.acm.org/citation.cfm?id=1964958&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/2010324.1964958

28 Aude Oliva , Antonio Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, v.42 n.3, p.145-175, May-June 2001 http://dl.acm.org/citation.cfm?id=598462&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1023/A:1011139631724

29 Sinno Jialin Pan , Qiang Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, v.22 n.10, p.1345-1359, October 2010 http://dl.acm.org/citation.cfm?id=1850545&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/TKDE.2009.191

30 Pierre Sermanet , Koray Kavukcuoglu , Soumith Chintala , Yann Lecun, Pedestrian Detection with Unsupervised Multi-stage Feature Learning, Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.3626-3633, June 23-28, 2013 http://dl.acm.org/citation.cfm?id=2516194&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/CVPR.2013.465

31 Hsiao-Hang Su , Tse-Wei Chen , Chieh-Chi Kao , Winston H. Hsu , Shao-Yi Chien, Scenic photo quality assessment with bag of aesthetics-preserving features, Proceedings of the 19th ACM international conference on Multimedia, November 28-December 01, 2011, Scottsdale, Arizona, USA http://dl.acm.org/citation.cfm?id=2071977&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1145/2072298.2071977

32 Yi Sun , Xiaogang Wang , Xiaoou Tang, Hybrid Deep Learning for Face Verification, Proceedings of the 2013 IEEE International Conference on Computer Vision, p.1489-1496, December 01-08, 2013 http://dl.acm.org/citation.cfm?id=2587089&CFID=558819604&CFTOKEN=68186175 http://dx.doi.org/10.1109/ICCV.2013.188


Full Text


intern file

Sonstige Links