Scanline VFX
Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: this https URL. Source code and model checkpoints are available on GitHub: this https URL.
29
The "Virtually Being" framework introduces a method for customizing camera-controllable video diffusion models using multi-view performance captures. It achieves superior multi-view identity preservation and precise 3D camera control, enabling high-fidelity character generation and interaction tailored for virtual production workflows.
3
Researchers developed Infinite-Resolution Integral Noise Warping, a technique that dramatically improves the efficiency of noise manipulation for video generation with image diffusion models. This method achieves an 8.0x speedup on GPU and 9.22x memory reduction compared to prior noise warping methods, while maintaining the statistical properties of noise essential for diffusion models.
7
While the LED panels used in virtual production systems can display vibrant imagery with a wide color gamut, they produce problematic color shifts when used as lighting due to their peaky spectral output from narrow-band red, green, and blue LEDs. In this work, we present an improved color calibration process for virtual production stages which ameliorates this color rendition problem while also passing through accurate in-camera background colors. We do this by optimizing linear color correction transformations for 1) the LED panel pixels visible in the field of view of the camera, 2) the pixels outside the field of view of the camera illuminating the subjects, and, as a post-process, 3) the pixel values recorded by the camera. The result is that footage shot in an RGB LED panel virtual production stage can exhibit more accurate skin tones and costume colors while still reproducing the desired colors of the in-camera background.
We present a unique system for large-scale, multi-performer, high resolution 4D volumetric capture providing realistic free-viewpoint video up to and including 4K resolution facial closeups. To achieve this, we employ a novel volumetric capture, reconstruction and rendering pipeline based on Dynamic Gaussian Splatting and Diffusion-based Detail Enhancement. We design our pipeline specifically to meet the demands of high-end media production. We employ two capture rigs: the Scene Rig, which captures multi-actor performances at a resolution which falls short of 4K production quality, and the Face Rig, which records high-fidelity single-actor facial detail to serve as a reference for detail enhancement. We first reconstruct dynamic performances from the Scene Rig using 4D Gaussian Splatting, incorporating new model designs and training strategies to improve reconstruction, dynamic range, and rendering quality. Then to render high-quality images for facial closeups, we introduce a diffusion-based detail enhancement model. This model is fine-tuned with high-fidelity data from the same actors recorded in the Face Rig. We train on paired data generated from low- and high-quality Gaussian Splatting (GS) models, using the low-quality input to match the quality of the Scene Rig, with the high-quality GS as ground truth. Our results demonstrate the effectiveness of this pipeline in bridging the gap between the scalable performance capture of a large-scale rig and the high-resolution standards required for film and media production.
There are no more papers matching your filters at the moment.