Efficient Processing Paves The Way For High-Quality AR/VR

Sensors Insights by Yair Siegel

Fun & Games

The launch of Pokemon Go just over a year ago, in July 2016, was a critical moment in the evolution of augmented reality (AR). Apps such as Blippar had already brought AR concepts to the mass market. But Pokemon Go showed how the technology could reinvent a longstanding game and make it an overnight success.

Users seized on the ability to mix events in a virtual world with their explorations of the real world outside. Instead of being disconnected in a sea of texts, they found how they could reconnect life and technology. Since then, technology companies reinvigorated efforts to make AR ubiquitous and open the door to a much wider range of applications that are enhanced with virtual reality (VR) technologies.


On The Phone

In the summer of this year, Google unveiled its ARCore framework. It is intended to bring AR and VR capabilities to a much wider range of Android-based devices than its previous Project Tango. Apple, which had introduced its own ARKit in May, subsequently deployed it in the new generation of iPhones that make AR fundamental to their design: from face recognition for security to smart sports apps that project relevant statistics onto the screen near to any player the software detects.

The ability to deliver applications that merge virtual action into reality demands not just high-performance graphics manipulation. It needs to run alongside software that interprets detected motion and other environmental signals from an array of different sensors. Machine-learning techniques will underpin the recognition algorithms that link these two halves of the AR infrastructure.


Inside-Out Tracking

In some cases, the AR application will receive help from outside sources. In the retail environment, beacons will provide location data to AR-enhanced indoor-navigation apps. But, for AR and VR to become ubiquitous for mobile devices, so-called inside-out tracking will be essential.

Figure 1

Inside-out tracking puts the mobile device in control. Relying on the sensors incorporated into the handset itself, inside-out tracking is more complex and challenging but has the huge advantage of working in most situations. Coupling appropriate sensors with suitably efficient on-board processing means that in most cases, objects can be detected and identified with all recognition being done on the edge device itself.

The first wave of AR-oriented devices relied on specialized sensors such as time-of-flight cameras that are optimized for range detection. More recent developments have put much more emphasis on applying existing sensors.

More advanced signal and vision processing algorithms work on the data the sensors provide to obtain extensive information on the user's surroundings. It is likely that two trends will emerge in parallel:

  1. AR is deployed on existing devices with simple sensors.
  2. New devices based on more powerful hardware and dedicated sensors will emerge that deliver higher quality AR user experiences. Prices of emerging products will fall as adoption grows.

The result of these trends is to change the nature of the processing that sits inside the handset. It is not enough to rely on the brute horsepower of MIPS and MFLOPS available in multicore general-purpose processors and in graphics processors. The battery cannot sustain the long-term current levels needed to feed these energy-demanding subsystems. What is needed is a focus on efficient signal-processing architectures that are tuned for machine-learning and environment-sensing algorithms.

CEVA is one company that has worked hard to bring deep learning and similar algorithms into the mobile arena. The highly parallelized architectures that result make careful use of memory, something that conventional CPUs and GPUs cannot easily achieve. Memory accesses by neural networks are as important to overall performance as processor throughput.


3D Rendering Is Key

The other side of efficient AR and VR support lies in rendering the 3D scenery. The artificial parts of the scene need to be as photorealistic as possible and to react to sudden changes in movement as the user turns and tilts the handset. This calls for extremely low latency in the software that communicates those changes to the 3D rendering engine.

At too high a latency, the virtual action becomes uncoupled from the real scene. If the user is wearing a headset and watching an almost entirely virtual scene, delays in shifting the image as they move around end up causing motion sickness.

Figure 2

Rapid sensor tracking can also help improve throughput in the rendering engine. Cameras can track eye movements made by the user and tell the rendering engine where to concentrate most of its effort. This foveated rendering takes advantage of the fact that the brain is most sensitive to the parts of an image that are in direct eye contact. Surrounding elements can be rendered using far less detail.

Again, high-speed signal and vision processing that can process the data from many types of sensor provide the key. By making those engines more energy efficient than the 3D rendering pipelines themselves, companies like CEVA are helping to make high-quality AR and VR possible. And will do so without draining the battery in a matter of hours.


About the author

Director of Segment Marketing, CEVA, Yair Siegel is focusing on expanding CEVA’s low power technology into new and exciting markets. His focus is on computer vision, deep learning, audio, voice and always-on technologies going into Mobile phones, AR/VR, and other consumer devices. Yair and his team collaborate with lead industry companies to bring these new technologies to the market.