Online class service provider
AI solution for advanced voice quality of multi-user online lecture platform
The global pandemic era brought rapid growth in use of Contact-Less technologies such as video conference solutions for corporate meetings, online video class platforms, video interviews or counseling, and other non-face-to-face business applications. The most stressful part of using this video conference solution is, however, the deterioration of sound quality due to background noise, echo, howling etc. that distracts participants.
A online video lecture service provider requested development of sound quality enhancement technology by using deep learning neural network to eliminate quality degradation factors such as echo/reverberation/howling/normal/abnormal background noise from various conditions (places, platforms) including simultaneous multi-access, single terminal usage conditions.
General purpose voice quality enhancement algorithm
Voice quality enhancement algorithm based on voice filer from shared terminal
Voice quality enhancement algorithm for high-performance terminal
AI processing speed by voice quality degradation factors
Measure AI processing delay for each voice quality degradation (ambient noise, acoustic echo, howling, etc.) factors based on one frame of audio analysis, achieve under 40ms delay required in real-time track by AEC (Acoustic Echo Cancellation) and DNS (Deep Noise Suppression) Challenge
Test set generation
Generate test data for more than 20 open microphone environments. On person speaks, and other microphone synthesized the sound with noise coming, that is composed of various types and strengths (e.g., pure noise intensity synthesized with an inform distribution of 0~25 with average clean speech) and generate data by various open microphone level such as 20, 30, 40 microphones.
Score 4.0 or above points in subjective sound quality evaluation
Create the world’s best sound quality improvement technology by setting scores higher than 3.52, which is the top score of Microsoft’s deep noise suppression challenge – INTERSPEECH 2020.
RNN-based speaker voice feature vector generation
Verification and application
Create sound DB and echo/reverberation data generator
Sound spectrogram filter modelling
Development of echo/ reverberation/howling noise elimination integrated module
Noise suppression in sound input during multi-party video conference
Noise suppression in sound input during multi-party video conference with multiple users participate through one microphone
Noise suppression in sound input during multi-party video conference through high performance smartphones
AI processing time for each voice degradation factors (ms): 40
Subjective quality evaluation score for voice enhancement: 40