
32.19M

1.09M

2.65M

47.59M
Our approach enables the training of a native large-scale 3D Gaussian Splatting (3DGS) model with over 1 billion Gaussian Primitives. The videos showcase the reconstruction results on a 2.7 km² city dataset with 140,000 images. Even when zooming in to a specific location, the details remain rich and realistic.
In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.
Different sizes of datasets require varying levels of computational power and numbers of 3DGS Primitives. Larger and higher-resolution datasets can no longer be trained using just a single GPU , which limits the pursuit of scale and fidelity in 3DGS.
The training pipeline is shown in above figure. We denote each subset as a sub-model and assign it to a separate GPU, while a central manager is responsible for managing the subspaces Sn. This manager also handles the parsing of incoming rendering requests and distributes rendering tasks to the relevant sub-models. The computational results from all sub-models, Tk and Ck, are sent back to the central manager. These results can be represented in a 2D map with the same resolution as the target image, which consumes only minimal communication bandwidth. The central manager then completes the final rendering, calculates the loss, and sends the gradient back to each sub-model for model parameter updates.
Our distributed training approach enables the use of a large number of Gaussian primitives. We observed a strong positive correlation between the number of Gaussian primitives and the final model's PSNR. When the number of primitives is similar, our PSNR closely matches the quality of 3DGS trained on a single GPU; however, we can effectively achieve higher PSNRs by simply increasing the number of primitives.
Previous methods [2,3] skip the merging process between subspaces, implicitly assuming that a single ray is primarily influenced by one subspace and predominantly contributed to by Gaussians within that subspace. This assumption holds true for bird's-eye view datasets. Due to the limited perspective variation in these datasets, rays almost always pass downward through one, or at most two, subspaces. Above figure shows the frequency histogram of the number of subspaces hit by each ray. As seen in more general datasets, such as street view datasetsand indoor datasets, rays often intersect multiple subspaces. This undermines the basic assumptions of previous works. Our distributed framework is computationally equivalent to the original 3DGS and does not rely on any assumptions about data distribution or perspective , making it a more general approach.
Below figure shows merge tests with only two subspaces for different partition approaches. The cell-independent methods used in previous works [2,3] can cause rendering artifacts at boundaries. Our approach renders the boundaries perfectly.
@article{li2024retinags,
title={RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians},
author={Li, Bingling and Chen, Shengyi and Wang, Luchao and He, Kaimin and Yan, Sijie and Xiong, Yuanjun},
journal={arXiv preprint arXiv:2406.11836},
year={2024}
}
[1] Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics.
[2] Lin, J., Li, Z., Tang, X., Liu, J., Liu, S., Liu, J., Lu, U., Wu, X., Xu, S., Yan, Y. & Yang, W. (2024). Vastgaussian: Vast 3d gaussians for large scene reconstruction. CVPR.
[3] Liu, Y., Guan, H., Luo, C., Fan, L., Peng, J., & Zhang, Z. (2024). Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. ECCV.