Guided image optimization

This example showcases how a guided, i.e. regionally constraint, NST can be performed in pystiche.

Usually, the style_loss discards spatial information since the style elements should be able to be synthesized regardless of their position in the style_image. Especially for images with clear separated regions style elements might leak into regions where they fit well with respect to the perceptual loss, but don’t belong for a human observer. This can be overcome with spatial constraints also called guides ([GEB+2017]).

We start this example by importing everything we need and setting the device we will be working on.

21 import pystiche
22 from pystiche import demo, enc, loss, optim
23 from pystiche.image import guides_to_segmentation, show_image
24 from pystiche.misc import get_device, get_input_image
26 print(f"I'm working with pystiche=={pystiche.__version__}")
28 device = get_device()
29 print(f"I'm working with {device}")

In a first step we load and show the images that will be used in the NST.

35 images = demo.images()
37 size = 500
41 content_image = images["castle"].read(size=size, device=device)
42 show_image(content_image)
47 style_image = images["church"].read(size=size, device=device)
48 show_image(style_image)

Unguided image optimization

As a baseline we use a default NST with a FeatureReconstructionLoss as content_loss and GramLoss as style_loss.

59 multi_layer_encoder = enc.vgg19_multi_layer_encoder()
61 content_layer = "relu4_2"
62 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
63 content_weight = 1e0
64 content_loss = loss.FeatureReconstructionLoss(
65     content_encoder, score_weight=content_weight
66 )
68 style_layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
69 style_weight = 1e4
72 def get_style_op(encoder, layer_weight):
73     return loss.GramLoss(encoder, score_weight=layer_weight)
76 style_loss = loss.MultiLayerEncodingLoss(
77     multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,
78 )
81 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
82 print(perceptual_loss)

We set the target images for the optimization criterion.

88 perceptual_loss.set_content_image(content_image)
89 perceptual_loss.set_style_image(style_image)

We perform the unguided NST and show the result.

95 starting_point = "content"
96 input_image = get_input_image(starting_point, content_image=content_image)
98 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
103 show_image(output_image)

While the result is not completely unreasonable, the building has a strong blueish cast that looks unnatural. Since the optimization was unconstrained the color of the sky was used for the building. In the remainder of this example we will solve this by dividing the images in multiple separate regions.

Guided image optimization

For both the content_image and style_image we load regional guides and show them.


In pystiche a guide is a binary image in which the white pixels make up the region that is guided. Multiple guides can be combined into a segmentation for a better overview. In a segmentation the regions are separated by color. You can use guides_to_segmentation() and segmentation_to_guides() to convert one format to the other.


The guides used within this example were created manually. It is possible to generate them automatically [CZP+2018], but this is outside the scope of pystiche.

134 content_guides = images["castle"], device=device)
135 content_segmentation = guides_to_segmentation(content_guides)
136 show_image(content_segmentation, title="Content segmentation")
141 style_guides = images["church"], device=device)
142 style_segmentation = guides_to_segmentation(style_guides)
143 show_image(style_segmentation, title="Style segmentation")

The content_image is separated in three regions: the "building", the "sky", and the "water".


Since no water is present in the style image we reuse the "sky" for the "water" region.

154 regions = ("building", "sky", "water")
156 style_guides["water"] = style_guides["sky"]

Since the stylization should be performed for each region individually, we also need separate losses. Within each region we use the same setup as before. Similar to how a MultiLayerEncodingLoss bundles multiple operators acting on different layers a MultiRegionLoss bundles multiple losses acting in different regions.

The guiding is only needed for the style_loss since the content_loss by definition honors the position of the content during the optimization. Thus, the previously defined content_loss is combined with the new regional style_loss.

171 def get_region_op(region, region_weight):
172     return loss.MultiLayerEncodingLoss(
173         multi_layer_encoder, style_layers, get_style_op, score_weight=region_weight,
174     )
177 style_loss = loss.MultiRegionLoss(regions, get_region_op, score_weight=style_weight)
179 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
180 print(perceptual_loss)

The content_loss is unguided and thus the content image can be set as we did before. For the style_loss we use the same style_image for all regions and only vary the guides.

188 perceptual_loss.set_content_image(content_image)
190 for region in regions:
191     perceptual_loss.set_style_image(
192         style_image, guide=style_guides[region], region=region
193     )
194     perceptual_loss.set_content_guide(content_guides[region], region=region)

We rerun the optimization with the new constrained optimization criterion and show the result.

201 starting_point = "content"
202 input_image = get_input_image(starting_point, content_image=content_image)
204 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
209 show_image(output_image)

With regional constraints we successfully removed the blueish cast from the building which leads to an overall higher quality. Unfortunately, reusing the sky region for the water did not work out too well: due to the vibrant color, the water looks unnatural.

Fortunately, this has an easy solution. Since we are already using separate losses for each region we are not bound to use only a single style_image: if required, we can use a different style_image for each region.

Guided image optimization with multiple styles

We load a second style image that has water in it.

229 second_style_image = images["cliff"].read(size=size, device=device)
230 show_image(second_style_image, "Second style image")
234 second_style_guides = images["cliff"], device=device)
235 show_image(guides_to_segmentation(second_style_guides), "Second style segmentation")

We can reuse the previously defined criterion and only change the style_image and style_guides in the "water" region.

242 region = "water"
243 perceptual_loss.set_style_image(
244     second_style_image, guide=second_style_guides[region], region=region
245 )

Finally, we rerun the optimization again with the new constraints.

251 starting_point = "content"
252 input_image = get_input_image(starting_point, content_image=content_image)
254 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
259 show_image(output_image)

Compared to the two previous results we now achieved the highest quality. Nevertheless, This approach has its downsides : since we are working with multiple images in multiple distinct regions, the memory requirement is higher compared to the other approaches. Furthermore, compared to the unguided NST, the guides have to be provided together with the for the content and style images.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery