Collaborative Control for Geometry-Conditioned PBR Image Generation

Shimon Vainer1,*, Mark Boss2,†, Mathias Parger1, Konstantin Kutsy1, Dante De Nigris1, Ciara Rowles1, Nicolas Perony1, Simon Donné1,*
1Unity 2Stability AI
*Equal Contributions Core Technical Contributions

By tightly linking the PBR diffusion model with a frozen RGB model, we produce high-quality PBR images conditioned on geometry and prompts

Abstract

Current 3D content generation builds on generative image models that output RGB only. Modern graphics pipelines, however, require physically-based rendering (PBR) material properties. We propose to model the PBR image distribution directly, sidestepping photometric inaccuracies in RGB generation and the inherent ambiguity in extracting PBR from RGB.

Existing approaches leveraging cross-modal finetuning are limited by the lack of data on one hand and the high dimensionality of the output modalities on the other: we overcome both challenges by keeping a frozen RGB model and tightly linking a newly trained PBR model using a novel cross-network communication paradigm.

As the base RGB model is fully frozen, the proposed method does not risk catastrophic forgetting during finetuning and remains compatible with techniques such as IP-Adapter pretrained for the base RGB model.

We validate our design choices, robustness to data sparsity, and compare against existing paradigms with an extensive experimental section.

Overview

Method

We establish a bidirectional Collaborative Control mechanism between the blue network (pre-trained Stable Diffusion) and the yellow one (PBR network)
Zooming in on a single Collaborative Control transformer block

Plug & Play

Since our method keeps the RGB model frozen, it is compatible with any control method in a plug & play fashion. We have experimented with IP-Adapter, but in the same way, it can be used with any other control method (T2I Adapter, ControlNet, etc.).
This feature vastly expands the usability of our method in practical scenarios.

Examples

"A wooden chest with a steel lock"
The model generalizes to 2D exemplars even though the dataset does not contain any. The condition normal map is constant \(z=1\)
An example of our method combined with another controlling network (IP-Adapter)

Interactive Demonstration

We provide an interactive demonstration of our method using Gradio. You can try out our method with your own meshes and prompts and see the results in real-time.

How to use:

  1. Pick a mesh from the examples or upload your own.
  2. After positioning the camera, click "Capture Normal Map"
  3. Enter a prompt in the text box.
  4. Click "Generate PBR Materials" to see the generated PBR maps.

Example meshes are provided below.

BibTeX

@article{,
  author    = {Vainer, Shimon and Boss, Mark and Parger, Mathias and Kutsy, Konstantin and 
               De Nigris, Dante and Rowles, Ciara and Perony, Nicolas and Donn\'e, Simon},
  title     = {Collaborative Control for Geometry-Conditioned PBR Image Generation},
  journal   = {arXiv preprint arXiv:2402.05919},
  year      = {2024},
}