온라인 과외

Online class service provider

AI solution for advanced voice quality of multi-user online lecture platform

Development time

7 Month

Manpower

13 Professionals

The Brief

The global pandemic era brought rapid growth in use of Contact-Less technologies such as video conference solutions for corporate meetings, online video class platforms, video interviews or counseling, and other non-face-to-face business applications. The most stressful part of using this video conference solution is, however, the deterioration of sound quality due to background noise, echo, howling etc. that distracts participants.

A online video lecture service provider requested development of sound quality enhancement technology by using deep learning neural network to eliminate quality degradation factors such as echo/reverberation/howling/normal/abnormal background noise from various conditions (places, platforms) including simultaneous multi-access, single terminal usage conditions.

Vector-3.png
Vector-2.png
Vector.png
Vector-4.png
멀티 미디어

Service

흰육각형.png

One

흰육각형.png

2

흰육각형.png

3

General purpose voice quality enhancement algorithm

Voice quality enhancement algorithm based on voice filer from shared terminal

Voice quality enhancement algorithm for high-performance terminal

Technological Challenges

AI processing speed by voice quality degradation factors

Measure AI processing delay for each voice quality degradation (ambient noise, acoustic echo, howling, etc.) factors based on one frame of audio analysis, achieve under 40ms delay required in real-time track by AEC (Acoustic Echo Cancellation) and DNS (Deep Noise Suppression) Challenge

Test set generation

Generate test data for more than 20 open microphone environments. On person speaks, and other microphone synthesized the sound with noise coming, that is composed of various types and strengths (e.g., pure noise intensity synthesized with an inform distribution of 0~25 with average clean speech) and generate data by various open microphone level such as 20, 30, 40 microphones.

Score 4.0 or above points in subjective sound quality evaluation

Create the world’s best sound quality improvement technology by setting scores higher than 3.52, which is the top score of Microsoft’s deep noise suppression challenge – INTERSPEECH 2020.

Road Map

  • RNN-based speaker voice feature vector generation

  • Clustering

  • Verification and application

small3.png
small1.png
small2.png
small4.png
small1.png
small2.png
  • Create sound DB and echo/reverberation data generator

  • Sound spectrogram filter modelling

  • Development of echo/ reverberation/howling noise elimination integrated module

Key Features

  • Noise suppression in sound input during multi-party video conference

  • Noise suppression in sound input during multi-party video conference with multiple users participate through one microphone

  • Noise suppression in sound input during multi-party video conference through high performance smartphones

Group 114.png

The Result

AI processing time for each voice degradation factors (ms): 40

Subjective quality evaluation score for voice enhancement: 40