jimmy/events (#6)

Co-authored-by: Jimmy Vargo <james@ayo.tokyo>
Reviewed-on: ayo/website#6
This commit is contained in:
corentin 2024-04-09 16:43:42 +09:00 committed by Corentin
commit e3ddda951f
76 changed files with 564 additions and 1 deletions

View file

@ -0,0 +1,38 @@
# Training A Speech-to-Text Neural Network
Speech-To-Text Recurrent Neural Network (RNN)
### Displaying the Data
In order to check out the sample data from the dataset and confirm its topology, I added a few arguments to the main function.
We can run the Python script with the `display` argument to get a sample output of our original data. This includes all the features like transcription, the raw samples and its shape, sample rate, duration, speaker ID, and more.
I also added a few optional flags for confirming the original data visually and audibly.
- `--waveform` will show a graph of the waveform, using Matplotlib
- `--spectrogram` will show a graph of the spectrogram (given by STFTs not MFCCs), using Librosa
- `--mfcc` will show a graph of the spectrogram (MFCCs), using Librosa
- `--play` will play the audio file
After running this, we now have our preprocessed data! We've transformed the dataset into usable MFCC data stored alongside the extracted features in persistent storage that's performant.
Using the `read-mfcc` argument in the Python script, I can confirm that the processed data has been stored properly and is readable by our model in a topology that is useful.
## Architecture
Input shape
- .
Layers
- GRU Layer
- GRU Layer
- Dense Layer
- Dropout Layer (to prevent overfitting)
- Dense Layer (softmax output)
Output shape
- .