Semantic image synthesis is a challenging task with many practical
applications. Albeit remarkable progress has been made in semantic image
synthesis with spatially-adaptive normalization and existing methods normalize
the feature activations under the coarse-level guidance (e.g., semantic class).
However, different parts of a semantic object (e.g., wheel and window of car)
are quite different in structures and textures, making blurry synthesis results
usually inevitable due to the missing of fine-grained guidance. In this paper,
we propose a novel normalization module, termed as REtrieval-based Spatially
AdaptIve normaLization (RESAIL), for introducing pixel level fine-grained
guidance to the normalization architecture. Specifically, we first present a
retrieval paradigm by finding a content patch of the same semantic class from
training set with the most similar shape to each test semantic mask. Then,
RESAIL is presented to use the retrieved patch for guiding the feature
normalization of corresponding region, and can provide pixel level fine-grained
guidance, thereby greatly mitigating blurry synthesis results. Moreover,
distorted ground-truth images are also utilized as alternatives of
retrieval-based guidance for feature normalization, further benefiting model
training and improving visual quality of generated images. Experiments on
several challenging datasets show that our RESAIL performs favorably against
state-of-the-arts in terms of quantitative metrics, visual quality, and
subjective evaluation. The source code and pre-trained models will be publicly
available.