PK rTJ58 8 0 advanced/example_guided_image_optimization.ipynb{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Guided image optimization\n\nThis example showcases how a guided, i.e. regionally constraint, NST can be performed\nin ``pystiche``.\n\nUsually, the ``style_loss`` discards spatial information since the style elements\nshould be able to be synthesized regardless of their position in the\n``style_image``. Especially for images with clear separated regions style elements\nmight leak into regions where they fit well with respect to the perceptual loss, but\ndon't belong for a human observer. This can be overcome with spatial constraints also\ncalled ``guides`` (:cite:`GEB+2017`).\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start this example by importing everything we need and setting the device we will\nbe working on.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pystiche\nfrom pystiche import demo, enc, loss, optim\nfrom pystiche.image import guides_to_segmentation, show_image\nfrom pystiche.misc import get_device, get_input_image\n\nprint(f\"I'm working with pystiche=={pystiche.__version__}\")\n\ndevice = get_device()\nprint(f\"I'm working with {device}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a first step we load and show the images that will be used in the NST.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"images = demo.images()\nimages.download()\nsize = 500"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"content_image = images[\"castle\"].read(size=size, device=device)\nshow_image(content_image)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"style_image = images[\"church\"].read(size=size, device=device)\nshow_image(style_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unguided image optimization\n\nAs a baseline we use a default NST with a\n:class:`~pystiche.loss.FeatureReconstructionLoss` as ``content_loss`` and\n:class:`~pystiche.loss.GramLoss` as ``style_loss``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"multi_layer_encoder = enc.vgg19_multi_layer_encoder()\n\ncontent_layer = \"relu4_2\"\ncontent_encoder = multi_layer_encoder.extract_encoder(content_layer)\ncontent_weight = 1e0\ncontent_loss = loss.FeatureReconstructionLoss(\n content_encoder, score_weight=content_weight\n)\n\nstyle_layers = (\"relu1_1\", \"relu2_1\", \"relu3_1\", \"relu4_1\", \"relu5_1\")\nstyle_weight = 1e4\n\n\ndef get_style_op(encoder, layer_weight):\n return loss.GramLoss(encoder, score_weight=layer_weight)\n\n\nstyle_loss = loss.MultiLayerEncodingLoss(\n multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,\n)\n\n\nperceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)\nprint(perceptual_loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We set the target images for the optimization ``criterion``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"perceptual_loss.set_content_image(content_image)\nperceptual_loss.set_style_image(style_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We perform the unguided NST and show the result.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"starting_point = \"content\"\ninput_image = get_input_image(starting_point, content_image=content_image)\n\noutput_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"show_image(output_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While the result is not completely unreasonable, the building has a strong blueish\ncast that looks unnatural. Since the optimization was unconstrained the color of the\nsky was used for the building. In the remainder of this example we will solve this by\ndividing the images in multiple separate regions.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Guided image optimization\n\nFor both the ``content_image`` and ``style_image`` we load regional ``guides`` and\nshow them.\n\n
Note
In ``pystiche`` a ``guide`` is a binary image in which the white pixels make up the\n region that is guided. Multiple ``guides`` can be combined into a ``segmentation``\n for a better overview. In a ``segmentation`` the regions are separated by color.\n You can use :func:`~pystiche.image.guides_to_segmentation` and\n :func:`~pystiche.image.segmentation_to_guides` to convert one format to the other.
\n\nNote
The guides used within this example were created manually. It is possible to\n generate them automatically :cite:`CZP+2018`, but this is outside the scope of\n ``pystiche``.
\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"content_guides = images[\"castle\"].guides.read(size=size, device=device)\ncontent_segmentation = guides_to_segmentation(content_guides)\nshow_image(content_segmentation, title=\"Content segmentation\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"style_guides = images[\"church\"].guides.read(size=size, device=device)\nstyle_segmentation = guides_to_segmentation(style_guides)\nshow_image(style_segmentation, title=\"Style segmentation\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``content_image`` is separated in three ``regions``: the ``\"building\"``, the\n``\"sky\"``, and the ``\"water\"``.\n\nNote
Since no water is present in the style image we reuse the ``\"sky\"`` for the\n ``\"water\"`` region.
\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"regions = (\"building\", \"sky\", \"water\")\n\nstyle_guides[\"water\"] = style_guides[\"sky\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the stylization should be performed for each region individually, we also need\nseparate losses. Within each region we use the same setup as before. Similar to\nhow a :class:`~pystiche.loss.MultiLayerEncodingLoss` bundles multiple\noperators acting on different layers a :class:`~pystiche.loss.MultiRegionLoss`\nbundles multiple losses acting in different regions.\n\nThe guiding is only needed for the ``style_loss`` since the ``content_loss`` by\ndefinition honors the position of the content during the optimization. Thus, the\npreviously defined ``content_loss`` is combined with the new regional ``style_loss``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def get_region_op(region, region_weight):\n return loss.MultiLayerEncodingLoss(\n multi_layer_encoder, style_layers, get_style_op, score_weight=region_weight,\n )\n\n\nstyle_loss = loss.MultiRegionLoss(regions, get_region_op, score_weight=style_weight)\n\nperceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)\nprint(perceptual_loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``content_loss`` is unguided and thus the content image can be set as we did\nbefore. For the ``style_loss`` we use the same ``style_image`` for all regions and\nonly vary the guides.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"perceptual_loss.set_content_image(content_image)\n\nfor region in regions:\n perceptual_loss.set_style_image(\n style_image, guide=style_guides[region], region=region\n )\n perceptual_loss.set_content_guide(content_guides[region], region=region)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We rerun the optimization with the new constrained optimization ``criterion`` and\nshow the result.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"starting_point = \"content\"\ninput_image = get_input_image(starting_point, content_image=content_image)\n\noutput_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"show_image(output_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With regional constraints we successfully removed the blueish cast from the building\nwhich leads to an overall higher quality. Unfortunately, reusing the sky region for\nthe water did not work out too well: due to the vibrant color, the water looks\nunnatural.\n\nFortunately, this has an easy solution. Since we are already using separate losses\nfor each region we are not bound to use only a single ``style_image``: if required,\nwe can use a different ``style_image`` for each region.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Guided image optimization with multiple styles\n\nWe load a second style image that has water in it.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"second_style_image = images[\"cliff\"].read(size=size, device=device)\nshow_image(second_style_image, \"Second style image\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"second_style_guides = images[\"cliff\"].guides.read(size=size, device=device)\nshow_image(guides_to_segmentation(second_style_guides), \"Second style segmentation\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can reuse the previously defined criterion and only change the ``style_image`` and\n``style_guides`` in the ``\"water\"`` region.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"region = \"water\"\nperceptual_loss.set_style_image(\n second_style_image, guide=second_style_guides[region], region=region\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we rerun the optimization again with the new constraints.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"starting_point = \"content\"\ninput_image = get_input_image(starting_point, content_image=content_image)\n\noutput_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"show_image(output_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compared to the two previous results we now achieved the highest quality.\nNevertheless, This approach has its downsides : since we are working with multiple\nimages in multiple distinct regions, the memory requirement is higher compared to the\nother approaches. Furthermore, compared to the unguided NST, the guides have to be\nprovided together with the for the content and style images.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}PK rTju!&