The ADS Achievements

October 17, 2022 | Blogs

The ADS achievements can be summarized as follows:

  • An acoustic vector sensor array was designed and developed for capturing audio signals
  • An accurate and beneficial audio dataset was generated for the model’s training
  • A system able to recognize single and overlapping sound events with high accuracy was implemented
  • A system that runs real-time on a Raspberry Pi 4B with the ReSpeaker Mic-Array

In more detail, public audio datasets (FSD50K, MIVIA) were collected and combined in order to shape a proper dataset for the model’s training. However, the fact that some of these audio samples were weakly labeled or contained wrong labels, affected the overall model’s performance. Therefore, unsupervised anomaly detection methods based on the MFCC features were applied in order to clean the data and remove the outliers. Based on each class distribution, K-means or DBSCAN algorithm was chosen. The following figures illustrate the result of K-means algorithm on male speech and traffic noise audio samples. The ones contained in 0 cluster were removed as they were detected as outliers. The final models were trained on this dataset which was proven to increase accuracy.

K-means clustering – Male speech class
K-means clustering – Traffic noise class
An example of first and second PCA for the male speech and traffic noise classes. Five clusters were created and the distance of the centroid was measured to remove outliers.

Furthermore, deep learning models were developed and trained on 11 classes for the sound event detection task. For this purpose, STFT spectrogram magnitude representations as well as mel-spectrogram energies features were extracted. DenseNet-121 and custom 2D CNN model based on YAMNet were implemented for single and overlapping sound event detection respectively. Moreover, for the needs of the overlapping sound event detection, mixup augmentation with 30% mixture was used. The final models were optimised to run on a Raspberry Pi 4B using the ReSpeaker Mic-Array and provide detection results in real-time.

Confusion matrix of single event detection model. The numbers 0-10 represent the following classes: aircraft, explosion, female speech, gunshot, male speech, screaming, siren, thunder, traffic noise, UAV noise, vehicle horn.
GUI for overlapping event detection. The top part shows the mel-spectrogram and the bottom part the target classes with a colored confidence score. The lighter it is (yellow) the more confident the system is.

Demo videos for both tasks are presented on the following videos:


Contact

Monica Florea
Administrative Coordinator

European Projects Department
SIMAVI
Soseaua Bucuresti-Ploiesti 73-81 COM
Bucuresti/ROMANIA

Email:

Çağlar Akman
Technical Coordinator

Command and Control Systems
HAVELSAN
Eskişehir Yolu 7 km
Ankara/TURKEY

Email:

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101019808.

© 2021 | TeamAware All Rights Reserved