C3G introduces a feed-forward framework that generates compact 3D representations using approximately 2,000 Gaussians from sparse, unposed multi-view images. This framework achieves competitive novel view synthesis and significantly enhances 3D scene understanding and multi-view correspondence by efficiently lifting uncompressed 2D semantic features into view-invariant 3D features, drastically reducing memory overhead by up to 15 times.
View blog