The following audio properties are considered:
mfcc: Mel Frequency Cepstral Coefficient, represents the short-term power spectrum of a sound
Flow Diagram
Datasets
Made use of two different datasets:1. RAVDESS
This dataset includes around 1500 audio file input from 24 different actors. 12 male and 12 female where these actors record short audios in 8 different emotions
i.e. 1 = neutral, 2 = calm, 3 = happy, 4 = sad, 5 = angry, 6 = fearful, 7 = disgust, 8 = surprised
Each audio file is named in such a way that the 7th character is consistent with the different emotions that they represent.
2. SAVEE
This dataset contains around 500 audio files recorded by 4 different male actors.
Feature Extraction
The next step involves extracting the features from the audio files which will help our model learn between these audio files. For feature extraction we make use of the LibROSA library in python which is one of the libraries used for audio analysis.- While extracting the features, all the audio files have been timed for 3 seconds to get equal number of features.
- The sampling rate of each file is doubled keeping sampling frequency constant to get more features which will help classify the audio file when the size of dataset is small.

No comments:
Post a Comment