jimmy/events (#6)

Co-authored-by: Jimmy Vargo <james@ayo.tokyo> Reviewed-on: ayo/website#6
2024-04-09 16:43:42 +09:00 · 2024-04-09 16:43:42 +09:00 · e3ddda951f
commit e3ddda951f
parent 83a0903155
76 changed files with 564 additions and 1 deletions
--- a/home/assets/blog/draft_2023-09-xx_Training
+++ b/home/assets/blog/draft_2023-09-xx_Training
@ -0,0 +1,38 @@
+# Training A Speech-to-Text Neural Network
+
+Speech-To-Text Recurrent Neural Network (RNN)
+
+
+### Displaying the Data
+
+In order to check out the sample data from the dataset and confirm its topology, I added a few arguments to the main function.
+
+We can run the Python script with the `display` argument to get a sample output of our original data. This includes all the features like transcription, the raw samples and its shape, sample rate, duration, speaker ID, and more.
+
+I also added a few optional flags for confirming the original data visually and audibly.
+- `--waveform` will show a graph of the waveform, using Matplotlib
+- `--spectrogram` will show a graph of the spectrogram (given by STFTs not MFCCs), using Librosa
+- `--mfcc` will show a graph of the spectrogram (MFCCs), using Librosa
+- `--play` will play the audio file
+
+
+
+After running this, we now have our preprocessed data! We've transformed the dataset into usable MFCC data stored alongside the extracted features in persistent storage that's performant.
+
+Using the `read-mfcc` argument in the Python script, I can confirm that the processed data has been stored properly and is readable by our model in a topology that is useful.
+
+
+## Architecture
+
+Input shape
+- .
+
+Layers
+- GRU Layer
+- GRU Layer
+- Dense Layer
+- Dropout Layer (to prevent overfitting)
+- Dense Layer (softmax output)
+
+Output shape
+- .