Can Apple’s HDR Augmented Reality Environments Solve Reflections for Neural Rendering?

0


Apple’s strong and long-term investment in augmented reality technologies is gaining momentum this year, with a new range of developer tools for capture and convert real-world objects in AR facets, and a growing industry conviction that dedicated AR glasses come to support the immersive experiences that this R&D blizzard can enable.

Among a slice of new information about Apple’s augmented reality efforts, a new paper from the company’s computer vision research division reveals a method of using 360-degree high dynamic range (HDR) panoramic images to provide scene-specific reflections and lighting for objects that are overlaid in augmented reality scenes.

Entitled HDR environment map estimation for real-time augmented reality, the article by Gowri Somanath, Apple Computer Vision research engineer and Daniel Kurz, senior machine learning manager, proposes the dynamic creation of real-time HDR environments via a convolutional neural network (CNN) operating in an environment mobile processing. The result is that reflective objects can literally reflect new and invisible environments on demand:

In Apple’s new AR object generation workflow, a pressure cooker is photogrammetrically instantiated with its surrounding environment, resulting in compelling highlights that are not “cooked” in texture. Source: https://docs-assets.developer.apple.com/

The method, launched at CVPR 2021, takes a snapshot of the entire scene and uses the EnvMapNet CNN to estimate a visually complete panoramic HDR image, also known as a “light probe”.

The resulting map identifies strong light sources (described at the end in the animation above) and takes them into account when rendering virtual objects.

The architecture of EnvMapNet, which processes limited images in full-scene HDR light probes.  Source: https://arxiv.org/pdf/2011.10687.pdf

The architecture of EnvMapNet, which processes limited images in full-scene HDR light probes. Source: https://arxiv.org/pdf/2011.10687.pdf

The algorithm can run in less than 9ms on an iPhone XS and is capable of rendering reflective sensitive objects in real time, with directional error reduced by 50% compared to previous and different approaches to the problem.

Light probes

HDR lighting environments have been a factor in visual effects since high dynamic range images (invented in 1986) became a notable force thanks to advances in computer technology in the 1990s. Behind the Scenes may have noticed the surreal presence on set of technicians wielding mirrored balls on sticks – reference images to incorporate as environmental factors when reconstructing CGI elements for the scene.

Source: https://beforesandafters.com/

Source: https://beforesandafters.com/

However, using chrome balls to reflection mapping textures predate the 1990s, dating back to 1983 SIGGRAPH paper Pyramidal parameters, which featured stills of a reflective CGI robot in a style that would become famous nearly a decade later thanks to James Cameron’s “liquid metal” effects Terminator 2: Judgment Day.

HDR environments in neural rendering?

Neural rendering provides the ability to generate photorealistic video from very sparse input, including raw segmentation maps.

Intel ISL segmentation> neural rendering of the image (2017).  Source: https://awesomeopensource.com/project/CQFIO/PhotographicImageSynthesis

Intel ISL segmentation> neural rendering of the image (2017). Source: https://awesomeopensource.com/project/CQFIO/PhotographicImageSynthesis

In May, Intel researchers revealed a new neural image synthesis initiative in which images from Grand Theft Auto V were used to generate photorealistic output based on German street image datasets.

Source: https://www.youtube.com/watch?v=0fhUJT21-bs

Source: https://www.youtube.com/watch?v=0fhUJT21-bs

The challenge in developing neural rendering environments that can be adapted to various lighting conditions is to separate the content of the object from the environmental factors that affect it.

As it stands, reflections and anisotropic effects remain functions of either the footage from the original dataset (making them inflexible) or require the same type of schema that researchers use. from Intel, which generates semi-photorealistic output from a raw (game) engine, performs segmentation on it, and then applies a style transfer from a “baked” data set (such as the German Mapillary street view set used in recent research).

In this neural rendering (images from GTA V are on the left), the vehicle in front exhibits convincing glare and even saturates the sensor of the fictitious virtual camera with reflections from the sun.  But this lighting aspect is derived from the game's original footage, as the neural facets of the scene lack self-contained, self-referential lighting structures that can be changed.

In this neural rendering derived from footage from GTA V (left), the vehicle ahead shows compelling glare and even saturates the fictional virtual camera sensor with reflections from the sun. But this aspect of lighting is derived from the lighting engine of the original play sequence, as the neural facets of the scene do not have self-contained, self-referencing lighting structures that can be changed.

Reflectance in NeRF

Imaging derived from neural radiation fields (NeRF) is also in dispute. Although recent research on NeRF has made progress in separating the elements that make up a neural scene (e.g. MIT / Google collaboration on NeRFactor), reflections remained an obstacle.

MIT and Google's NeRFactor approach separates normals, visibility (shadows), texture, and local albedo, but it does not reflect an environment, because it exists in a vacuum.  Source: https://arxiv.org/pdf/2106.01970.pdf

MIT and Google’s NeRFactor approach separates normals, visibility (shadows), texture, and local albedo, but it doesn’t reflect a larger (or moving) environment, as it essentially exists in a vacuum. Source: https://arxiv.org/pdf/2106.01970.pdf

NeRF can solve this problem with the same type of HDR mapping that Apple uses. Each pixel in a neural radiation field is computed along a path from a virtual camera to the point where the “ray” can no longer move, like ray tracing in traditional CGI. Adding an HDR input to the calculation of this radius is a potential method of achieving true environmental reflectance, and is in fact an analog to CGI’s “global illumination” or radiosity rendering methods, in which a scene or an object is partially illuminated by its own perceived reflections. environment.

While it is guaranteed that an HDR matrix will do nothing to alleviate the noticeable computational loads of NeRF, many research in this area is currently focusing on this aspect of the processing pipeline. Inevitably, reflectance is one of the many factors waiting behind the scenes to fill and challenge this newly optimized architecture. However, NeRF cannot reach its full potential as a methodology for synthesizing discrete neural images and videos without adopting a way to take into account a surrounding environment.

Reflectance in Neural Rendering Pipelines

In a putative HDR version of the Intel GTA V neural rendering scenario, a single HDR could not support the dynamic reflections that need to be expressed in moving objects. For example, in order to see its own vehicle reflected in the vehicle in front as it approaches the lights, the front vehicle entity could have its own animated HDR light sensor, the resolution of which would gradually degrade as it would move away from the end. from the user’s point of view, to become low-res and simply representative as it moves away into the distance – a proximity-based LOD similar to “drawn distance” delimiters in video games.

The real potential of Apple’s work on HDR lighting and reflection cards is not that they are particularly innovative, as they build on previous work in general image synthesis and reflection maps. AR scene development. Rather, the possible breakthrough is represented by how severe local computing constraints combined with Apple’s M-series machine learning hardware innovations to produce a lightweight, low-latency HDR mapping designed to work with limited resources.

If this problem can be solved economically, the advent of semantic segmentation> photorealistic video synthesis could take a further step forward.

Source: https://docs-assets.developer.apple.com/


Leave A Reply

Your email address will not be published.