ML - Paper Copilot

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

Can Wang, Ruixiang Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

City University of Hong Kong; The Hong Kong Polytechnic University; Snap Inc.; Netflix; Microsoft Cloud AI; Microsoft Cloud AI

Portals

Abstract

As a powerful representation of 3D scenes, Neural radiance fields (NeRF) enable high-quality novel view synthesis given a set of multi-view images. Editing NeRF, however, remains challenging, especially on simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a single text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the new domain characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress the cloudy artifacts and the geometry noises when transforming the density field for geometry stylization. Through extensive experiments on various styles, our method is demonstrated to be effective and robust regarding both single-view stylization quality and cross-view consistency.

Related Works

Neural Style Transfer on Images and Videos; Neural Stylization on Explicit 3D Representations; Neural Stylization on NeRF; Text-Driven Stylization

Comparisons

NeuS, VolSDF, CLIP-NeRF, DreamField, StyleGAN-NADA

ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields

Jiahua Dong; Yu-Xiong Wang;

University of Illinois Urbana-Champaign

Portals

Abstract

We introduce ViCA-NeRF, a view-consistency-aware method for 3D editing with text instructions. In addition to the implicit NeRF modeling, our key insight is to exploit two sources of regularization that explicitly propagate the editing information across different views, thus ensuring multi-view consistency. As geometric regularization, we leverage the depth information derived from the NeRF model to establish image correspondence between different views. As learned regularization, we align the latent codes in the 2D diffusion model between edited and unedited images, enabling us to edit key views and propagate the update to the whole scene. Incorporating these two regularizations, our ViCA-NeRF framework consists of two stages. In the initial stage, we blend edits from different views to create a preliminary 3D edit. This is followed by a second stage of NeRF training that is dedicated to further refining the scene’s appearance. Experiments demonstrate that ViCA-NeRF provides more flexible, efficient(3 times faster) editing with higher levels of consistency and details, compared with the state of the art.

Related Works

Text-to-image diffusion models for 2D editing; Implicit 3D Representation; 3D Generation; NeRF Editing

Comparisons

NeRF-Art, Instruct-NeRF2NeRF

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, Hao Su

UC San Diego; UCLA; Cornell University; Zhejiang University; IIT Madras; Adobe Research

Portals

Abstract

Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but suffer from lengthy optimization time, 3D inconsistency results, and poor geometry. In this work, we propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. Given a single image, we first use a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space. Since traditional reconstruction methods struggle with inconsistent multi-view predictions, we build our 3D reconstruction module upon an SDF-based generalizable neural surface reconstruction method and propose several critical training strategies to enable the reconstruction of 360-degree meshes. Without costly optimizations, our method reconstructs 3D shapes in significantly less time than existing methods. Moreover, our method favors better geometry, generates more 3D consistent results, and adheres more closely to the input image. We evaluate our approach on both synthetic data and in-the-wild images and demonstrate its superiority in terms of both mesh quality and runtime. In addition, our approach can seamlessly support the text-to-3D task by integrating with off-the-shelf text-to-image diffusion models.

Related Works

3D Generation Guided by 2D Prior Models; Single Image to 3D; Generalizable Neural Reconstruction

Comparisons

Zero-1-to-3, TensoRF, NeuS, Point-E, Shap-E, RealFusion, 3DFuse, GeoNeRF, SparseNeuS, DreamFusion

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

Peking University; Nanyang Technological University; Baidu

Portals

Abstract

Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.

Related Works

3D REPRESENTATIONS; TEXT-TO-3D GENERATION; IMAGE-TO-3D GENERATION

Comparisons

Zero-1-to-3, One-2-3-45, Shap-E, DreamFusion, Point-E

InpaintNeRF360: Text-Guided 3D Inpainting on Unbounded Neural Radiance Fields

Dongqing Wang, Tong Zhang, Alaa Abboud, Sabine Süsstrunk

EPFL

Portals

Abstract

Neural Radiance Fields (NeRF) can generate highly realistic novel views. However, editing 3D scenes represented by NeRF across 360-degree views, particularly removing objects while preserving geometric and photometric consistency, remains a challenging problem due to NeRF\'s implicit scene representation. In this paper, we propose InpaintNeRF360, a unified framework that utilizes natural language instructions as guidance for inpainting NeRF-based 3D scenes.Our approach employs a promptable segmentation model by generating multi-modal prompts from the encoded text for multiview segmentation. We apply depth-space warping to enforce viewing consistency in the segmentations, and further refine the inpainted NeRF model using perceptual priors to ensure visual plausibility. InpaintNeRF360 is capable of simultaneously removing multiple objects or modifying object appearance based on text instructions while synthesizing 3D viewing-consistent and photo-realistic inpainting. Through extensive experiments on both unbounded and frontal-facing scenes trained through NeRF, we demonstrate the effectiveness of our approach and showcase its potential to enhance the editability of implicit radiance fields.

Related Works

Image Inpainting; Inpainting Neural Radiance Fields; Object Segmentation with 3D consistency; Text Instructed 3D Editing

Comparisons

Instruct-NeRF2NeRF, SPIn-NeRF

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

UC Berkeley

Portals

Abstract

We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed method is able to edit large-scale, real-world scenes, and is able to accomplish more realistic, targeted edits than prior work.

Related Works

Physical Editing of NeRFs; Artistic Stylization of NeRFs; Generating 3D Content; Instruction as an Editing Interface

Comparisons

NeRF-Art

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields

Ori Gordon, Omri Avrahami, Dani Lischinski

The Hebrew University of Jerusalem

Portals

Abstract

Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multiview consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Related Works

Neural Implicit Representations; NeRF 3D Generation; Editing NeRFs

Comparisons

Volumetric Disentanglement

BlendNeRF: 3D-aware Blending with Generative NeRFs

Hyunsu Kim, Gayoung Lee, Yunjey Choi, Jin-Hwa Kim, Jun-Yan Zhu

NAVER AI Lab; SNU AIIS; CMU

Portals

Abstract

Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF\'s latent representation space, rather than raw pixel space. Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.

Related Works

Image blending; 3D-aware generative models; 3D-aware image editing

Comparisons

Poisson Blending, Latent Composition, StyleGAN3, StyleMapGAN, SDEdit

SINE: Semantic-driven Image-based NeRF Editingwith Prior-guided Editing Field

Error: Cannot create object

Category: ML

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons

Portals

Abstract

Related Works

Comparisons