medium-shot-woman-holding-lion-illustration 1.png

Online class service provider

AI solution for advanced voice quality of multi-user online lecture platform

The global pandemic era brought rapid growth in use of Contact-Less technologies such as video conference solutions for corporate meetings, online video class platforms, video interviews or counseling, and other non-face-to-face business applications. The most stressful part of using this video conference solution is, however, the deterioration of sound quality due to background noise, echo, howling etc. that distracts participants.

A online video lecture service provider requested development of sound quality enhancement technology by using deep learning neural network to eliminate quality degradation factors such as echo/reverberation/howling/normal/abnormal background noise from various conditions (places, platforms) including simultaneous multi-access, single terminal usage conditions.

일반 단말 음향.png

Generak purpose voice quality enhancement algorithm

공유 단말 음향.png

voice quality enhancement algorithm based on voice filter from shared terminal

고품질 단말 음향.png

voice quality enhancement algorithm for high-performance terminal

Technological Challenges

AI processing speed by voice quality degradation factors

Measure AI processing delay for each voice quality degradation (ambient noise, acoustic echo, howling, etc.) factors based on one frame of audio analysis, achieve under 40ms delay required in real-time track by AEC (Acoustic Echo Cancellation) and DNS (Deep Noise Suppression) Challenge

1

Score 4.0 or above points in subjective sound quality evaluation

Create the world’s best sound quality improvement technology by setting scores higher than 3.52, which is the top score of Microsoft’s deep noise suppression challenge – INTERSPEECH 2020.

2

Test set generation

Generate test data for more than 20 open microphone environments. On person speaks, and other microphone synthesized the sound with noise coming, that is composed of various types and strengths (e.g., pure noise intensity synthesized with an inform distribution of 0~25 with average clean speech) and generate data by various open microphone level such as 20, 30, 40 microphones.

3

Road Map

Create sound DB and echo / reverberation data generator

RNN-based speaker voice feature vector generation

Sound spectrogram filter modeling

Clustering

Development of echo / reservation / howling noise elimination intergrated module

Verification and application

Key Features

philo-S 설명_영문.png

Noise suppression in sound input during multi-party video conference

1

Noise suppression in sound input during multi-party video conference with multiple users participate through one microphone

2

Noise suppression in sound input during multi-party video conference through high performance smartphones

3

The Result

40

AI processing time for each voice degradation factors (ms)

40

Subjective quality evaluation score for voice enhancement

Check out Ellexi's AI

​Philo-S (Speech to Text) Curious about the solution?