AAI_2025_Capstone_Chronicles_Combined
14
Training Procedure
The deep learning model is trained on computed mel spectrograms together with our engineered MIR features, and learns to predict multi-label perceptual timbre qualities such as bright, dark, distorted, percussive, and long-release. The dataset is split into training, validation, and test sets using an 80-10-10 ratio. Oversampling is applied to underrepresented labels to balance class distributions. The model is trained for 30 epochs. Binary cross entropy is used as the loss function, and the Adam optimizer is used with a learning rate chosen through empirical evaluation. Training and validation losses are monitored throughout, and early stopping is employed to help reduce overfitting.
Results
The results demonstrate how well the two timbre-representation approaches, the PCA-based model and the deep Audio Spectrogram Transformer plus CRNN model, learned meaningful structure for our similarity search task. Evaluation includes training curves, qualitative inspection of nearest neighbors, and interpretation of timbral structures observed in the embedding spaces.
PCA Model Results
The PCA model provides a clean baseline for evaluating timbre representation. After dimensionality reduction, the first several principal components capture a substantial proportion of the variance in the engineered descriptor set. For example, spectral centroid, bandwidth, and
346
Made with FlippingBook - Share PDF online