Audio Preprocessing blog post (#4)
Added new blog post & draft outline of next one. Reviewed-on: ayo/website#4 Co-authored-by: Jimmy Vargo <james@ayo.tokyo> Co-committed-by: Jimmy Vargo <james@ayo.tokyo>
This commit is contained in:
parent
e5c6ae1061
commit
d37504276a
8 changed files with 199 additions and 0 deletions
|
|
@ -0,0 +1,38 @@
|
|||
# Training A Speech-to-Text Neural Network
|
||||
|
||||
Speech-To-Text Recurrent Neural Network (RNN)
|
||||
|
||||
|
||||
### Displaying the Data
|
||||
|
||||
In order to check out the sample data from the dataset and confirm its topology, I added a few arguments to the main function.
|
||||
|
||||
We can run the Python script with the `display` argument to get a sample output of our original data. This includes all the features like transcription, the raw samples and its shape, sample rate, duration, speaker ID, and more.
|
||||
|
||||
I also added a few optional flags for confirming the original data visually and audibly.
|
||||
- `--waveform` will show a graph of the waveform, using Matplotlib
|
||||
- `--spectrogram` will show a graph of the spectrogram (given by STFTs not MFCCs), using Librosa
|
||||
- `--mfcc` will show a graph of the spectrogram (MFCCs), using Librosa
|
||||
- `--play` will play the audio file
|
||||
|
||||
|
||||
|
||||
After running this, we now have our preprocessed data! We've transformed the dataset into usable MFCC data stored alongside the extracted features in persistent storage that's performant.
|
||||
|
||||
Using the `read-mfcc` argument in the Python script, I can confirm that the processed data has been stored properly and is readable by our model in a topology that is useful.
|
||||
|
||||
|
||||
## Architecture
|
||||
|
||||
Input shape
|
||||
- .
|
||||
|
||||
Layers
|
||||
- GRU Layer
|
||||
- GRU Layer
|
||||
- Dense Layer
|
||||
- Dropout Layer (to prevent overfitting)
|
||||
- Dense Layer (softmax output)
|
||||
|
||||
Output shape
|
||||
- .
|
||||
Loading…
Add table
Add a link
Reference in a new issue