January 28, 2023


Unlimited Technology

Nvidia showcases groundbreaking generative AI research at NeurIPS 2022

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.

Nvidia showcased groundbreaking artificial intelligence (AI) innovations at NeurIPS 2022. The hardware giant continues to push the boundaries of technology in machine learning (ML), self-driving cars, robotics, graphics, simulation and more. 

The three categories of awards at NeurIPS 2022 were these: outstanding main track papers, outstanding datasets and benchmark track papers, and the test of time paper. Nvidia bagged two awards this year for its research papers on AI, one exploring diffusion-based generative AI models, the other about training generalist AI agents. 

Nvidia also presented a series of AI advancements it had worked on for the past year. It has released two papers, on providing unique lighting approaches and on 3D model creation, following up on its work in 3D and generative AI.

“NeurIPS is a major conference in machine learning, and we see high value in participating in the show among other leaders in the field. We showcased 60+ research projects at the conference and were proud to have two papers honored with NeurIPS 2022 Awards for their contributions to machine learning,” Sanja Fidler, VP of AI research at Nvidia and a writer on both the 3D MoMa and GET3D papers, told VentureBeat.  


Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

Register Now

Synthetic data generation for images, text and video were the key themes of several Nvidia-authored papers. Other subjects covered were reinforcement learning, data gathering and augmentation, weather models and federated learning.

Nvidia unveils a new way of designing diffusion-based generative models 

Diffusion-based models have emerged as one of the most disruptive techniques in generative AI. Diffusion models have shown intriguing potential to achieve superior image sample quality compared to traditional methods such as GANs (generative adversarial networks). Nvidia researchers won an “outstanding main track paper” award for their work in diffusion model design, which suggests model design improvements based on an analysis of several diffusion models. 

Their paper, titled “Elucidating the design space of diffusion-based generative models,” breaks down the components of a diffusion model into a modular design, assisting developers in identifying processes that may be altered to improve the overall model’s performance. Nvidia claims that these suggested design modifications can dramatically improve diffusion models’ efficiency and quality. 

The methods defined in the paper are primarily independent of model components, such as network architecture and training details. However, the researchers first measured baseline results for different models using their original output capabilities, then tested them through a unified framework using a set formula, followed by minor tweaks that resulted in improvements. This method allowed the research team to adequately evaluate different practical choices and propose general improvements for the diffusion model’s sampling process that are universally applicable to all models.

The methods described in the paper also proved to be highly effective, as they allowed models to achieve record scores with enhanced capabilities when compared with performance metrics such as ImageNet-64 and CIFAR-10.

Results of Nvidia’s architecture tested on various benchmarking datasets. Image Source: Nvidia

That said, the research team also noted that such advances in sample quality could amplify adverse societal effects when used in a large-scale system like DALL·E 2. These negative effects could include disinformation, emphasis on stereotypes and harmful biases. Moreover, the training and sampling of such diffusion models also require much electricity; Nvidia’s project consumed ∼250MWh on an in-house cluster of Nvidia V100s. 

Generating complex 3D shapes from 2D images

Most tech giants are gearing up to showcase their metaverse capabilities, including Nvidia. Earlier this year, the company demonstrated how Omniverse could be the go-to platform for creating metaverse applications. The company has now developed a model that can generate high-fidelity 3D models from 2D images, further enhancing its metaverse tech stack. 

Named Nvidia GET3D (for its ability to generate explicit textured 3D meshes), the model is trained only on 2D images but can generate 3D shapes with intricate details and a high polygon count. It creates the figures in a triangle mesh, similar to a paper-mâché model, covered with a layer of textured material.

“The metaverse is made up of large, consistent virtual worlds. These virtual worlds need to be populated by 3D content — but there aren’t enough experts in the world to create the massive amount of content required by metaverse applications,” said Fidler. “GET3D is an early example of the kind of 3D generative AI we are creating to give users a diverse and scalable set of tools for content creation.”

Overview of GET3D architecture. Image Source: Nvidia

Moreover, the model generates these shapes in the same triangle mesh format used by popular 3D applications. This allows creative professionals to quickly import the assets into gaming engines, 3D modeling software and film renderers so they can start working on them. These AI-generated objects can populate 3D representations of buildings, outdoor locations or whole cities, as well as digital environments developed for the robotics, architecture and social media sectors.

According to Nvidia, prior 3D generative AI models were significantly limited in the level of detail they could produce; even the most sophisticated inverse-rendering algorithms could only construct 3D objects based on 2D photographs collected from multiple angles, requiring developers to build one 3D shape at a time.

Manually modeling a realistic 3D world is time- and resource-intensive. AI tools like GET3D can vastly optimize the 3D modeling process and allow artists to focus on what matters. For example, when executing inference on a single Nvidia GPU, GET3D can produce 20 forms in a second, operating like a generative adversarial network for 2D photos while producing 3D objects.

The more extensive and diversified the training dataset, the more varied and comprehensive the output. The model was trained on NVIDIA A100 tensor core GPUs, using one million 2D images of 3D shapes captured from several camera angles. 

Once a GET3D-generated form is exported to a graphics tool, artists can apply realistic lighting effects as the item moves or rotates in a scene. Developers may also employ language cues to create a picture in a particular style by combining another AI tool from Nvidia, StyleGAN-NADA. For example, they might alter a rendered automobile to become a burnt car or a taxi, or convert an ordinary house into a haunted one.

According to the researchers, a future version of GET3D might incorporate camera pose estimation techniques. This would allow developers to train the model on real-world data rather than synthetic datasets. The model will also be updated to enable universal generation, which means that developers will be able to train GET3D on all types of 3D forms simultaneously rather than on one object category at a time.

Improving 3D rendering pipelines with lighting

At the most recent CVPR conference in New Orleans in June, Nvidia Research introduced 3D MoMa. Developers can use this inverse-rendering approach to generate 3D objects comprising three parts: a 3D mesh model, materials placed on the model, and lighting.

Since then, the team has made substantial progress in untangling materials and lighting from 3D objects, allowing artists to change AI-generated forms by switching materials or adjusting lighting as the item travels around a scene. Now presented at NeurIPS 2022, 3D MoMa relies on a more realistic shading model that uses Nvidia RTX GPU accelerated ray tracing.

Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multiview images. However, Nvidia says that most methods still rely on simple rendering algorithms such as prefiltered direct lighting or learned representations of irradiance. Nvidia’s 3D MoMa model incorporates Monte Carlo integration, an approach that substantially improves decomposition into shape, materials and lighting.

3D MoMa’s Monte Carlo integration. Image Source: Nvidia

Unfortunately, Monte Carlo integration provides estimates with significant noise, even at large sample counts, making gradient-based inverse rendering challenging. To address this, the development team incorporated multiple importance sampling and denoising in a novel inverse-rendering pipeline. Doing so substantially improved convergence and enabled gradient-based optimization at low sample counts. 

Nvidia’s paper on diffusion-based generative models also presents an efficient method to jointly reconstruct geometry (explicit triangle meshes), materials and lighting, substantially improving material and light separation compared to previous work. Finally, Nvidia hypothesizes that denoising can become integral to high-quality inverse rendering pipelines.

Fidler highlighted the importance of lighting in a 3D environment and said that realistic lighting is crucial to a 3D scene. 

“By reconstructing the geometry and disentangling lighting effects from the material properties of objects, we can produce content that supports relighting effects and augmented reality (AR) — which is much more useful for creators, artists and engineers,” Fidler told VentureBeat. “With AI, we want to accelerate and generate these 3D objects by learning from a wide variety of images rather than manually creating every piece of content.”

Image Source: Nvidia

3D MoMa achieves this. As a result, the content it produces can be directly imported into existing graphics software and used as building blocks for complex scenes. 

The 3D MoMa model does have limitations. They include a lack of efficient regularization of material specular parameters, and reliance on a foreground segmentation mask. In addition, the researchers note in the paper that the approach is computationally intense, requiring a high-end GPU for optimization runs.

The paper puts forth a unique Monte Carlo rendering method combined with variance-reduction techniques, practical and applicable to multiview 3D object reconstruction of explicit triangular 3D models. 

Nvidia’s future AI focus

Fidler said that Nvidia is very excited about generative AI, as the company believes that the technology will soon open up opportunities for more people to be creators.

“You’re already seeing generative AI, and our work within the field, being used to create amazing images and beautiful works of art,” she said. “Take Refik Anadol’s exhibition at the MoMA, for example, which utilizes Nvidia StyleGAN.”

Fidler said that other emerging domains Nvidia is currently working on are foundational models, self-supervised learning and the metaverse. 

“Foundational models can train on enormous, unlabeled datasets, which opens the door to more scalable approaches for solving a range of problems with AI. Similarly, self-supervised learning is aimed at learning from unlabeled data to reduce the need for human annotation, which can be a barrier to progress,” explained Fidler. 

“We also see many opportunities in gaming and the metaverse, using AI to generate content on the fly so that the experience is unique every time. In the near future, you’ll be able to use it for entire villages, landscapes and cities by assembling an example of an image to generate an entire 3D world.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.