Voice-Capture Devices Grace IoT and Consumer Designs

XMOS will demonstrate its comprehensive portfolio of VocalFusion true-stereo, voice-capture solutions for far-field voice-enabled stereo smart TVs, soundbars, and set-top boxes at Computex 2018 in June. The technology captures voice commands for use in Alexa, Google, DuerOS, and other voice-enabled artificial intelligence (AI) and internet of things (IoT) systems. The company will also be previewing next-gen VocalSorcery blind-source separation technology.

 

VocalFusion Stereo Dev Kit technology accurately captures voice commands from across the room, even in complex noisy environments, and when the same audio appliance is playing content at high volume. This development kit is described as the first stereo acoustic echo cancellation (AEC) far-field linear microphone array solution. It also supports configurable AEC latency, where the AEC reference signals can be accurately calibrated, and the latency adjusted, to enable after-market far-field voice accessories for existing consumer electronics products. Available as an Amazon Alexa Voice Service qualified solution, and for use with Google, Baidu or other voice enabled AI systems.

FREE SENSORS NEWSLETTER

Like this story? Subscribe to Sensors Online!

Sensors delivers a suite of newsletters, each serving as an information resource to help engineers and engineering professionals make efficient design and business decisions. Sign up to get news and updates delivered to your inbox and read on the go.

 

VocalFusion voice processors deliver voice digital signal processing (DSP) including a full duplex acoustic echo canceller (AEC) with barge-in capability that enables users to interrupt or pause a device that's playing music, and an adaptive beamformer that follows a speaker. Additional dereverberation, automatic gain control, and noise suppression provide clear voice interaction experiences even in noisy environments. The processor interfaces directly to four PDM microphones in a linear array with 33.33-mm inter-mic spacing, making it viable for integration into flat screens and products found at the edge of a room.

 

VocalSorcery is a blind sound source signal separation technology that spatially identifies individual speakers or conversations within a crowded noisy audio environment to optimize voice capture and input into speech recognition systems. This technology solves the cocktail-party problem, and opens a wide variety of applications from video and conference calls, to automotive.

 

For more information, checkout the VocalFusion data page and visit XMOS.