DepthGen:使用扩散模型的单目深度估计

2023-03-01  本文已影响0人  Valar_Morghulis

Monocular Depth Estimation using Diffusion Models

Feb 2023

Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet

[Google Research]

https://arxiv.org/abs/2302.14816

https://depth-gen.github.io

我们使用去噪扩散模型来制定单目深度估计,其灵感来自去噪扩散模型最近在高保真图像生成中的成功。为此,我们引入了创新来解决由于训练数据中的噪声、不完整深度图而产生的问题,包括step-unrolled去噪扩散、L1损失和训练期间的深度填充。为了应对监督训练数据的有限可用性,我们利用了自监督图像到图像翻译任务的预训练。尽管该方法简单,但我们的DepthGen模型具有通用的损失和架构,在室内NYU数据集上实现了SOTA性能,在室外KITTI数据集上获得了接近SOTA的结果。此外,使用多模态后验,DepthGen自然地表示深度模糊(例如,来自透明表面),其zero-shot性能与深度插补相结合,实现了简单但有效的文本到3D管道。

We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an L1 loss, and depth infilling during training. To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks. Despite the simplicity of the approach, with a generic loss and architecture, our DepthGen model achieves SOTA performance on the indoor NYU dataset, and near SOTA results on the outdoor KITTI dataset. Further, with a multimodal posterior, DepthGen naturally represents depth ambiguity (e.g., from transparent surfaces), and its zero-shot performance combined with depth imputation, enable a simple but effective text-to-3D pipeline.

上一篇 下一篇

猜你喜欢

热点阅读