alphaXiv

History

Papers Benchmarks

Scanline VFX

7,937

06 Aug 2025

computer-science computer-vision-and-pattern-recognition generative-models

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Stanford University

University of Maryland

Stony Brook University Netflix Scanline VFX Netflix Eyeline Studios

Ryan Burgert

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: this https URL. Source code and model checkpoints are available on GitHub: this https URL.

16 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures

Netflix

HKUST Scanline VFX Eyeline Labs

The "Virtually Being" framework introduces a method for customizing camera-controllable video diffusion models using multi-view performance captures. It achieves superior multi-view identity preservation and precise 3D camera control, enabling high-fidelity character generation and interaction tailored for virtual production workflows.

126

02 Nov 2024

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Infinite-Resolution Integral Noise Warping for Diffusion Models

Stanford University

Stony Brook University Netflix Scanline VFX Netflix Eyeline Studios

Ryan Burgert

Researchers developed Infinite-Resolution Integral Noise Warping, a technique that dramatically improves the efficiency of noise manipulation for video generation with image diffusion models. This method achieves an 8.0x speedup on GPU and 9.22x memory reduction compared to prior noise warping methods, while maintaining the statistical properties of noise essential for diffusion models.

01 Jul 2022

computer-science computer-vision-and-pattern-recognition graphics

Jointly Optimizing Color Rendition and In-Camera Backgrounds in an RGB Virtual Production Stage

Netflix Scanline VFX

While the LED panels used in virtual production systems can display vibrant imagery with a wide color gamut, they produce problematic color shifts when used as lighting due to their peaky spectral output from narrow-band red, green, and blue LEDs. In this work, we present an improved color calibration process for virtual production stages which ameliorates this color rendition problem while also passing through accurate in-camera background colors. We do this by optimizing linear color correction transformations for 1) the LED panel pixels visible in the field of view of the camera, 2) the pixels outside the field of view of the camera illuminating the subjects, and, as a post-process, 3) the pixel values recorded by the camera. The result is that footage shot in an RGB LED panel virtual production stage can exhibit more accurate skin tones and costume colors while still reproducing the desired colors of the in-camera background.

31 Oct 2025

computer-science graphics

Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture

Scanline VFX

We present a unique system for large-scale, multi-performer, high resolution 4D volumetric capture providing realistic free-viewpoint video up to and including 4K resolution facial closeups. To achieve this, we employ a novel volumetric capture, reconstruction and rendering pipeline based on Dynamic Gaussian Splatting and Diffusion-based Detail Enhancement. We design our pipeline specifically to meet the demands of high-end media production. We employ two capture rigs: the Scene Rig, which captures multi-actor performances at a resolution which falls short of 4K production quality, and the Face Rig, which records high-fidelity single-actor facial detail to serve as a reference for detail enhancement. We first reconstruct dynamic performances from the Scene Rig using 4D Gaussian Splatting, incorporating new model designs and training strategies to improve reconstruction, dynamic range, and rendering quality. Then to render high-quality images for facial closeups, we introduce a diffusion-based detail enhancement model. This model is fine-tuned with high-fidelity data from the same actors recorded in the Face Rig. We train on paired data generated from low- and high-quality Gaussian Splatting (GS) models, using the low-quality input to match the quality of the Scene Rig, with the high-quality GS as ground truth. Our results demonstrate the effectiveness of this pipeline in bridging the gap between the scalable performance capture of a large-scale rig and the high-resolution standards required for film and media production.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode