In this post, we'll be seeing how the style of a reference image can be transferred to the background of a target image. We will be discussing Neural Style Transfer, Semantic Segmentation, and then combining both techniques to achieve Localized Style Transfer. You will also see how we can achieve the portrait effect that has been made famous by iPhones!
-- Input Image (Left); Reference Style Image (Center); Output Image (Right) --
Let's explore the various computer vision techniques that are necessary to achieve localized style transfer.
Neural style transfer
In neural style transfer, you take two images -- a content image and a style reference image and blend them together so the style from the style reference and content from the content image is retained. Let's see some examples of style transfer below.
-- Content image (Left); Style Image (Center); Result of style transfer (Right) --
-- Content image (Left); Style image (Middle); Result of style transfer (Right) --
-- Content image (Top); Style image (Middle); Result of style transfer (Bottom) --
The principle is simple (albeit the math is a bit complicated which we won't be delving into)- we define two distances, one for the content and one for the style. Content distance measures how different the content is between two images (the content image and the style image) while style distance measures the difference of style between two images. We then take a third image (the resultant image) and keep updating this resultant image to minimize both its content-distance with the content-image and its style-distance with the style-image.
Basically, it is a game of tug of war to optimize style from the style image and content from content image to create an optimal resultant image.
Semantic segmentation to extract subject (using Google's Deeplab)
- Architecture showing how localized style transfer of an image works -
Semantic segmentation is computer vision task where specific regions in an image are labeled based on what's being shown. To be more specific, each pixel of an image is labeled based on the class it belongs to. There can be 2 or more classes in a single image. (You can refer to one of our blogs where this is explained in detail).
- Semantic segmentation -
DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. We'll be using this model to extract the subject (i.e. a person from an image) and perform style transfer only on the background part of the image.
Localized style transfer
To perform localized style transfer, we use Deeplab to identify the pixels that are classified as humans. We then perform style transfer on the complete input image and using simple image processing techniques, superimpose the subject (human identified by the Deeplab model) onto the style transferred image.
-- Portrait effect using DeepLab to identify the subject using semantic segmentation model --
Results of localized style transfer
-- Input Image (Left); Reference Style Image (Middle); Output Image (Right) --
-- Input Image (Left); Reference Style Image (Middle); Output Image (Right) --
Conclusion
In this blog, we have blended two major computer vision algorithms (neural style transfer and semantic segmentation) and even though the blending is done using a naive approach (by superimposing the mask image on the style transfer image) the result looks pretty good. Further development and improvements of the techniques used in this blog can lead to the creation of new image filters for applications such as Instagram, Facebook .etc
References
https://arxiv.org/abs/1508.06576 - Original style transfer paper
https://github.com/anishathalye/neural-style - Code for performing style transfer
https://github.com/tensorflow/models/tree/master/research/deeplab - Deeplab for semantic segmentation
https://colab.research.google.com/github/tensorflow/models/blob/master/research/deeplab/deeplab_demo.ipynb - Deeplab notebook for quick experimentations
Comments