Audio Preprocessing blog post (#4)

Added new blog post & draft outline of next one. Reviewed-on: ayo/website#4 Co-authored-by: Jimmy Vargo <james@ayo.tokyo> Co-committed-by: Jimmy Vargo <james@ayo.tokyo>
2023-09-21 14:55:57 +09:00 · 2023-09-21 14:55:57 +09:00 · d37504276a
commit d37504276a
parent e5c6ae1061
8 changed files with 199 additions and 0 deletions
--- a/assets/blog/draft_2023-09-xx_Training
+++ b/assets/blog/draft_2023-09-xx_Training
@ -0,0 +1,38 @@
+# Training A Speech-to-Text Neural Network
+
+Speech-To-Text Recurrent Neural Network (RNN)
+
+
+### Displaying the Data
+
+In order to check out the sample data from the dataset and confirm its topology, I added a few arguments to the main function.
+
+We can run the Python script with the `display` argument to get a sample output of our original data. This includes all the features like transcription, the raw samples and its shape, sample rate, duration, speaker ID, and more.
+
+I also added a few optional flags for confirming the original data visually and audibly.
+- `--waveform` will show a graph of the waveform, using Matplotlib
+- `--spectrogram` will show a graph of the spectrogram (given by STFTs not MFCCs), using Librosa
+- `--mfcc` will show a graph of the spectrogram (MFCCs), using Librosa
+- `--play` will play the audio file
+
+
+
+After running this, we now have our preprocessed data! We've transformed the dataset into usable MFCC data stored alongside the extracted features in persistent storage that's performant.
+
+Using the `read-mfcc` argument in the Python script, I can confirm that the processed data has been stored properly and is readable by our model in a topology that is useful.
+
+
+## Architecture
+
+Input shape
+- .
+
+Layers
+- GRU Layer
+- GRU Layer
+- Dense Layer
+- Dropout Layer (to prevent overfitting)
+- Dense Layer (softmax output)
+
+Output shape
+- .