Deep Neural Networks Transfer The Style of an Image Onto Another

Editing a picture to make it “Instagram worthy” can be difficult. Most simple apps have the basic filters, highlighting and exposure tools that you might expect. But apps that transform photos into a custom portrait in the style of your favorite artist need to use something more complex.

This is because imposing, for example, the distinctive brushstrokes and features of Vincent can Gogh’s The Starry Night onto an average photo can often distort the structure of the image. Existing programs focus on the content and style of images, but usually do not preserve the edges and contours of the subjects photographed. This causes the final image to lose the structural details of the original photograph.

Prof. Kavita Bala, computer science, and Fujun Luan, grad, teamed up with researchers Sylvain Paris and Eli Shechtman from Adobe Systems to develop algorithms to solve this issue, which has gained more interest as apps like Prisma launched.

“Stylization of images can improve the aesthetic quality of images, which is what many professional photographers care about,” Bala said. “The key technical challenge is combining the idea of stylization, which is very powerful in changing images, while still preserving the realism of the image.”

“We tried a lot of ideas and performed many experiments before reaching the final solution,” Luan said. “Although those previous ideas didn’t work well, we learned from them and made use of that knowledge and refined them to arrive at this particular idea.”

To facilitate similar work in the field, they also created a supplementary file listing the procedures and ideas that did not work. The code for the project was also hosted on GitHub.


The input image on the left had it’s structure and content combined with the style image on the right.

“We did post the code,” Bala said. “How viral it went, took us completely by surprise.”

The rise in interest in such projects has to do with the deep learning revolution, Bala said. Deep learning algorithms date back to the 1980s, but numerous breakthroughs over the past five years have broadened its scope. Combining images or searching through visual images are some examples of tasks that use deep learning.

This technology is based on the computational power of neural networks, which is a computational model that is loosely based off the functioning of the human brain’s nerve network. The net consists of different computational layers and is trained by being fed large amounts of image data. Consequently, the network learns numbers and matrices that represent aspects of an image. Together, this represents a complex function that can be used to understand different features, such as the style or objects, that make up an image.

“The core research vision that excites me is to use knowledge of human perception to drive better algorithms that produce better solutions,” Bala said.

The output image from this stylization process contains the content and structures from one image and the style, such as the colors and lighting, from another image. Applications such as Prisma use the content and style from images but not necessarily the structure, which makes images look non-realistic.

To ensure the structure is effectively recreated in the output image, the team attempted to improve the program’s understanding of the image’s semantics. To do so, they used algorithms that were specifically designed to identify certain objects in images. For example, such algorithms are able to identify where a building is, whether an object is a dog or a cat or if a section of the image is part of a lake or the sky. Such recognition is important because the different stylistic elements superimposed onto an image depend on the objects present in it.

“The style you apply to foliage is not the same you would apply to a lake or a building,” Bala said.
Despite the team’s strides in learning how to preserve realism and structure, the team insists that numerous challenges still need to be tackled.

One concern the team hopes to address is the program’s speed limitations. Currently, it takes approximately a few minutes to render the final image, but it is possible to do so in seconds. Students in Bala’s lab are currently working towards this goal.

Luan also stresses that the program’s understanding of the image’s semantics can be developed further.

“Currently, we use some state-of-the-art scene parsing algorithm but sometimes the results are not satisfying enough,” Luan said. “Exploring the dense semantic correspondence using the feature responses of the deep neural network would be helpful to address this problem.”

The team will be presenting their research at the Conference on Computer Vision and Pattern Recognition in July.