Skip to content
Paper Copilot™, originally my personal project, is now open to the public. I deeply appreciate your feedback and support.
twitter x github-circle reddit

Paper Copilot Paper Copilot™ Research Toolbox
  • Statistics
    • AI/ML
      • AAAI
        • 2025
      • ICLR
        • 2025
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2013
        • 2014
      • ICML
        • 2024
        • 2023
      • NeurIPS
        • 2024
          • Main Conference
          • Datasets & Benchmarks
          • Creative AI
          • High School Projects
        • 2023
          • Main Conference
          • Datasets & Benchmarks
        • 2022
          • Main Conference
          • Datasets & Benchmarks
        • 2021
          • Main Conference
          • Datasets & Benchmarks
      • UAI
        • 2024
    • Data Mining
      • KDD
        • 2024
          • Research Track
          • Applied Data Science Track
        • 2025
          • Research Track
          • Applied Data Science Track
    • Graphics
      • SIGGRAPH
      • SIGGRAPH Asia
    • Multimedia
      • ACMMM
        • 2024
    • NLP
      • ACL
        • 2024
      • COLM
        • 2024
      • EMNLP
        • 2024
        • 2023
    • Robotics
      • CoRL
        • 2024
        • 2023
        • 2022
        • 2021
      • ICRA
        • 2025
      • IROS
        • 2025
      • RSS
        • 2025
    • Vision
      • 3DV
        • 2025
      • CVPR
        • 2025
      • ECCV
        • 2024
      • ICCV
      • WACV
        • 2025
  • Accepted Papers
    • AI/ML
      • ICLR
        • 2025
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2014
        • 2013
      • NeurIPS
        • 2024
          • Main Conference
          • Dataset & Benchmark
        • 2023
          • Main Conference
          • Dataset & Benchmark
        • 2022
          • Main Conference
          • Dataset & Benchmark
        • 2021
          • Main Conference
          • Dataset & Benchmark
      • ICML
        • 2024
        • 2023
    • Graphics
      • SIGGRAPH
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
      • SIGGRAPH Asia
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
    • Vision
      • CVPR
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2016
        • 2015
        • 2014
        • 2013
      • ICCV
        • 2023
        • 2021
        • 2019
        • 2017
        • 2015
        • 2013
      • ECCV
        • 2024
        • 2022
        • 2020
        • 2018
      • WACV
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
  • Countdown
  • Map
    • 3D
    • 2D
  • Contact Us
    • About Us
    • Acknowledgment
    • Report Issues
  • twitter x github-circle reddit

Category: AIGC

Home » CS » AI » AIGC

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Image-to-3D NeRF Pretrained LDM

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, Hao Su

UC San Diego; UCLA; Cornell University; Zhejiang University; IIT Madras; Adobe Research

Portals
  • pdf
  • YouTube
  • Project
  • One-2-3-45
  • arXiv
  • Paperswithcode
Abstract

Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but suffer from lengthy optimization time, 3D inconsistency results, and poor geometry. In this work, we propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. Given a single image, we first use a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space. Since traditional reconstruction methods struggle with inconsistent multi-view predictions, we build our 3D reconstruction module upon an SDF-based generalizable neural surface reconstruction method and propose several critical training strategies to enable the reconstruction of 360-degree meshes. Without costly optimizations, our method reconstructs 3D shapes in significantly less time than existing methods. Moreover, our method favors better geometry, generates more 3D consistent results, and adheres more closely to the input image. We evaluate our approach on both synthetic data and in-the-wild images and demonstrate its superiority in terms of both mesh quality and runtime. In addition, our approach can seamlessly support the text-to-3D task by integrating with off-the-shelf text-to-image diffusion models.

Related Works

3D Generation Guided by 2D Prior Models; Single Image to 3D; Generalizable Neural Reconstruction

Comparisons

Zero-1-to-3, TensoRF, NeuS, Point-E, Shap-E, RealFusion, 3DFuse, GeoNeRF, SparseNeuS, DreamFusion

2023

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Image-to-3D Pretrained LDM Text-to-3D

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

Peking University; Nanyang Technological University; Baidu

Portals
  • pdf
  • Project
  • dreamgaussian
  • arXiv
  • Paperswithcode
Abstract

Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.

Related Works

3D REPRESENTATIONS; TEXT-TO-3D GENERATION; IMAGE-TO-3D GENERATION

Comparisons

Zero-1-to-3, One-2-3-45, Shap-E, DreamFusion, Point-E

2023

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Image-to-3D NeRF Pretrained LDM

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

KAUST; Snap Inc.; University of Oxford

Portals
  • pdf
  • Project
  • Magic123
  • arXiv
  • Paperswithcode
Abstract

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Related Works

Multi-view 3D reconstruction; In-domain single-view 3D reconstruction; Zero-shot single-view 3D reconstruction

Comparisons

Shap-E, 3D Fuse, Neural Lift, Real Fusion

2023

Magic3D: High-Resolution Text-to-3D Content Creation

NeRF Pretrained LDM Text-to-3D

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

NVIDIA

Portals
  • pdf
  • YouTube
  • Project
  • arXiv
  • Paperswithcode
  • CVF
Abstract

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2× faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

Related Works

Text-to-image generation; 3D generative models; Text-to-3D generation

Comparisons

DreamFusion

2023 CVPR

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation

NeRF Pretrained LDM Text-to-3D

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, Greg Shakhnarovich

TTI-Chicago; Purdue University

Portals
  • pdf
  • Project
  • sjc
  • arXiv
  • Paperswithcode
  • CVF
Abstract

A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.

Related Works

Neural radiance fields (NeRF); 2D-supervised 3D GANs; CLIP-guided, optimization-based 3D generative models; DreamFusion

Comparisons

DreamFusion

2023 CVPR

D-SDS: Debiasing Scores and Prompts of 2D Diffusionfor View-consistent Text-to-3D Generation

NeRF Pretrained LDM Text-to-3D
Error: Cannot create object