Image optimization with pyramid

This example showcases how an image pyramid is integrated in an NST with pystiche.

With an image pyramid the optimization is not performed on a single but rather on multiple increasing resolutions. This procedure is often dubbed coarse-to-fine, since on the lower resolutions coarse structures are synthesized whereas on the higher levels the details are carved out.

This technique has the potential to reduce the convergence time as well as to enhance the overall result [LW2016][GEB+2017].

We start this example by importing everything we need and setting the device we will be working on.

23 import time
25 import pystiche
26 from pystiche import demo, enc, loss, optim, pyramid
27 from pystiche.image import show_image
28 from pystiche.misc import get_device, get_input_image
30 print(f"I'm working with pystiche=={pystiche.__version__}")
32 device = get_device()
33 print(f"I'm working with {device}")

At first we define a PerceptualLoss that is used as optimization criterion.

40 multi_layer_encoder = enc.vgg19_multi_layer_encoder()
43 content_layer = "relu4_2"
44 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
45 content_weight = 1e0
46 content_loss = loss.FeatureReconstructionLoss(
47     content_encoder, score_weight=content_weight
48 )
51 style_layers = ("relu3_1", "relu4_1")
52 style_weight = 2e0
55 def get_style_op(encoder, layer_weight):
56     return loss.MRFLoss(encoder, patch_size=3, stride=2, score_weight=layer_weight)
59 style_loss = loss.MultiLayerEncodingLoss(
60     multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,
61 )
63 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
64 print(perceptual_loss)

Next up, we load and show the images that will be used in the NST.

70 images = demo.images()
72 size = 500
77 content_image = images["bird2"].read(size=size, device=device)
78 show_image(content_image, title="Content image")
79 perceptual_loss.set_content_image(content_image)
84 style_image = images["mosaic"].read(size=size, device=device)
85 show_image(style_image, title="Style image")
86 perceptual_loss.set_style_image(style_image)

Image optimization without pyramid

As a baseline we use a standard image optimization without pyramid.

95 starting_point = "content"
96 input_image = get_input_image(starting_point, content_image=content_image)
97 show_image(input_image, title="Input image")

We time the NST performed by image_optimization() and show the result.

104 start_without_pyramid = time.time()
105 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=400)
106 stop_without_pyramid = time.time()
108 show_image(output_image, title="Output image without pyramid")
112 elapsed_time_without_pyramid = stop_without_pyramid - start_without_pyramid
113 print(
114     f"Without pyramid the optimization took {elapsed_time_without_pyramid:.0f} seconds."
115 )

As you can see the small blurry branches on the left side of the image were picked up by the style transfer. They distort the mosaic pattern, which minders the quality of the result. In the next section we tackle this by focusing on coarse elements first and add the details afterwards.

Image optimization with pyramid

Opposed to the prior examples we now want to perform an NST on multiple resolutions. In pystiche this handled by an ImagePyramid . The resolutions are selected by specifying the edge_sizes of the images on each level . The optimization is performed for num_steps on the different levels.

The resizing of all images, i.e. input_image and target images (content_image and style_image) is handled by the pyramid. For that we need to register the perceptual loss (criterion) as one of the resize_targets.


By default the edge_sizes correspond to the shorter edge of the images. To change that you can pass edge="long". For fine-grained control you can also pass a sequence comprising "short" and "long" to select the edge for each level separately. Its length has to match the length of edge_sizes.


For a fine-grained control over the number of steps on each level you can pass a sequence to select the num_steps for each level separately. Its length has to match the length of edge_sizes.

150 edge_sizes = (250, 500)
151 num_steps = 200
152 image_pyramid = pyramid.ImagePyramid(
153     edge_sizes, num_steps, resize_targets=(perceptual_loss,)
154 )
155 print(image_pyramid)

With a pyramid the NST is performed by pyramid_image_optimization(). We time the execution and show the result afterwards.


We regenerate the input_image since it was changed inplace during the first optimization.

168 input_image = get_input_image(starting_point, content_image=content_image)
170 start_with_pyramid = time.time()
171 output_image = optim.pyramid_image_optimization(
172     input_image, perceptual_loss, image_pyramid
173 )
174 stop_with_pyramid = time.time()
176 show_image(output_image, title="Output image with pyramid")
182 elapsed_time_with_pyramid = stop_with_pyramid - start_with_pyramid
183 relative_decrease = 1.0 - elapsed_time_with_pyramid / elapsed_time_without_pyramid
184 print(
185     f"With pyramid the optimization took {elapsed_time_with_pyramid:.0f} seconds. "
186     f"This is a {relative_decrease:.0%} decrease."
187 )

With the coarse-to-fine architecture of the image pyramid, the stylization of the blurry background branches is reduced leaving the mosaic pattern mostly intact. On top of this quality improvement the execution time is significantly lower while performing the same number of steps.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery