Depth-aware neural style transfer

Aus de_evolutionary_art_org
Wechseln zu: Navigation, Suche


Xiao-Chang Liu Ming-Ming Cheng Yu-Kun Lai Paul L. Rosin: Depth-aware neural style transfer.



Neural style transfer has recently received significant attention and demonstrated amazing results. An efficient solution proposed by Johnson et al. trains feed-forward convolutional neural networks by defining and optimizing perceptual loss functions. Such methods are typically based on high-level features extracted from pre-trained neural networks, where the loss functions contain two components: style loss and content loss. However, such pre-trained networks are originally designed for object recognition, and hence the high-level features often focus on the primary target and neglect other details. As a result, when input images contain multiple objects potentially at different depths, the resulting images are often unsatisfactory because image layout is destroyed and the boundary between the foreground and background as well as different objects becomes obscured. We observe that the depth map effectively reflects the spatial distribution in an image and preserving the depth map of the content image after stylization helps produce an image that preserves its semantic content. In this paper, we introduce a novel approach for neural style transfer that integrates depth preservation as additional loss, preserving overall image layout while performing style transfer.

Extended Abstract


author = {Liu, Xiao-Chang and Cheng, Ming-Ming and Lai, Yu-Kun and Rosin, Paul L.},
title = {Depth-aware Neural Style Transfer},
booktitle = {Proceedings of the Symposium on Non-Photorealistic Animation and Rendering},
series = {NPAR '17},
year = {2017},
isbn = {978-1-4503-5081-5},
location = {Los Angeles, California},
pages = {4:1--4:10},
articleno = {4},
numpages = {10},
url = {},
doi = {10.1145/3092919.3092924},
acmid = {3092924},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {deep learning, depth, non-photorealistic rendering},

Used References

1 Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. 2016. Single-image depth perception in the wild. In NIPS. 730--738.

2 Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, and Philip Torr. 2014. BING: Binarized Normed Gradients for Objectness Estimation at 300fps. In CVPR.

3 Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

4 Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A Matlab-like environment for machine learning. In BigLearn, NIPS Workshop.

5 Alexei A Efros and William T Freeman. 2001. Image quilting for texture synthesis and transfer. In ACM SIGGRAPH. 341--346.

6 Alexei A Efros and Thomas K Leung. 1999. Texture synthesis by non-parametric sampling. In ICCV, Vol. 2. 1033--1038.

7 David Eigen , Rob Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), p.2650-2658, December 07-13, 2015

8 Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In CVPR. 2414--2423.

9 Leon A Gatys, Alexander S Ecker, Matthias Bethge, Aaron Hertzmann, and Eli Shechtman. 2017. Controlling Perceptual Factors in Neural Style Transfer. In CVPR.

10 A Geiger , P Lenz , C Stiller , R Urtasun, Vision meets robotics: The KITTI dataset, International Journal of Robotics Research, v.32 n.11, p.1231-1237, September 2013

11 Ross Girshick , Jeff Donahue , Trevor Darrell , Jitendra Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.580-587, June 23-28, 2014

12 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

13 Aaron Hertzmann, Non-Photorealistic Rendering and the science of art, Proceedings of the 8th International Symposium on Non-Photorealistic Animation and Rendering, June 07-10, 2010, Annecy, France

14 Aaron Hertzmann, Charles E Jacobs, Nuria Oliver, Brian Curless, and David H Salesin. 2001. Image analogies. In ACM SIGGRAPH. 327--340.

15 Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

16 Tobias Isenberg. 2013. Evaluating and Validating Non-photorealistic and Illustrative Rendering. In Image and Video-Based Artistic Stylisation, Paul L. Rosin and John P. Collomosse (Eds.). Springer, 311--331.

17 Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer, 694--711.

18 Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

19 Congyan Lang, Tam V Nguyen, Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Shuicheng Yan. 2012. Depth matters: Influence of depth cues on visual saliency. In ECCV. 101--115.

20 Bo Li, Chunhua Shen, YuchaoDai, Anton van den Hengel, and Mingyi He. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In CVPR. 1119--1127.

21 Chuan Li and Michael Wand. 2016. Combining Markov Random Fields and convolutional neural networks for image synthesis. In CVPR. 2479--2486.

22 Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. Demystifying Neural Style Transfer. CoRR abs/1701.01036 (2017).

23 Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV. 740--755.

24 Beyang Liu, Stephen Gould, and Daphne Koller. 2010. Single image depth estimation from predicted semantic labels. In CVPR. 1253--1260.

25 Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015. Deep convolutional neural fields for depth estimation from a single image. In CVPR. 5162--5170.

26 Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, and Xiang Bai. 2017. Richer Convolutional Features for Edge Detection. In CVPR.

27 Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.

28 Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In CVPR. 5188--5196.

29 Graeme McCaig, Steve DiPaola, and Liane Gabora. 2016. Deep Convolutional Networks as Models of Generalization and Blending Within Visual Creativity. CoRR abs/1610.02478(2016).

30 Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

31 Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.

32 Paul L. Rosin and J. Collomosse. 2013. Image and Video-based Artistic Stylisation. Springer.

33 Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2016. Artistic style transfer for videos. In GCPR. 26--36.

34 Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. 2005. Learning depth from single monocular images. In NIPS, Vol. 18. 1--8.

35 Ahmed Selim, Mohamed Elgharib, and Linda Doyle. 2016. Painting style transfer for head portraits using convolutional neural networks. ACM TOG 35, 4 (2016), 129.

36 Nathan Silberman , Derek Hoiem , Pushmeet Kohli , Rob Fergus, Indoor segmentation and support inference from RGBD images, Proceedings of the 12th European conference on Computer Vision, October 07-13, 2012, Florence, Italy

37 Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034 (2013).

38 Thomas Strothotte and Stefan Schlechtweg. 2002. Non-photorealistic computer graphics: modeling, rendering, and animation. Morgan Kaufmann.

39 James T Todd and J Farley Norman. 2003. The visual perception of 3-D shape from multiple cues: Are observers capable of perceiving metric structure? Perception & Psychophysics 65, 1 (2003), 31--47.

40 Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. 2016. Texture networks: Feed-forward synthesis of textures and stylized images.

41 Jingdong Wang , Huaizu Jiang , Zejian Yuan , Ming-Ming Cheng , Xiaowei Hu , Nanning Zheng, Salient Object Detection: A Discriminative Regional Feature Integration Approach, International Journal of Computer Vision, v.123 n.2, p.251-268, June 2017

42 Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, and Alan L Yuille. 2015. Towards unified depth and semantic prediction from a single image. In CVPR. 2800--2809.

43 Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. In CVPR.

44 Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2016. STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation. (2016).

45 Ziyu Zhang, Alexander G Schwing, Sanja Fidler, and Raquel Urtasun. 2015. Monocular object instance segmentation and depth ordering with CNNs. In ICCV. 2614--2622.

46 Daniel Zoran, Phillip Isola, Dilip Krishnan, and William T Freeman. 2015. Learning ordinal relationships for mid-level vision. In ICCV. 388--396.


Full Text

internal file

Sonstige Links