The ADS achievements can be summarized as follows:
In more detail, public audio datasets (FSD50K, MIVIA) were collected and combined in order to shape a proper dataset for the model’s training. However, the fact that some of these audio samples were weakly labeled or contained wrong labels, affected the overall model’s performance. Therefore, unsupervised anomaly detection methods based on the MFCC features were applied in order to clean the data and remove the outliers. Based on each class distribution, K-means or DBSCAN algorithm was chosen. The following figures illustrate the result of K-means algorithm on male speech and traffic noise audio samples. The ones contained in 0 cluster were removed as they were detected as outliers. The final models were trained on this dataset which was proven to increase accuracy.
Furthermore, deep learning models were developed and trained on 11 classes for the sound event detection task. For this purpose, STFT spectrogram magnitude representations as well as mel-spectrogram energies features were extracted. DenseNet-121 and custom 2D CNN model based on YAMNet were implemented for single and overlapping sound event detection respectively. Moreover, for the needs of the overlapping sound event detection, mixup augmentation with 30% mixture was used. The final models were optimised to run on a Raspberry Pi 4B using the ReSpeaker Mic-Array and provide detection results in real-time.
Demo videos for both tasks are presented on the following videos:
Contact
Monica Florea
Administrative Coordinator
European Projects Department
SIMAVI
Soseaua Bucuresti-Ploiesti 73-81 COM
Bucuresti/ROMANIA
Email:
Çağlar Akman
Technical Coordinator
Command and Control Systems
HAVELSAN
Eskişehir Yolu 7 km
Ankara/TURKEY
Email:
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101019808.