AAI_2025_Capstone_Chronicles_Combined

14

Training Procedure

The deep learning model is trained on computed mel spectrograms together with our engineered MIR features, and learns to predict multi-label perceptual timbre qualities such as bright, dark, distorted, percussive, and long-release. The dataset is split into training, validation, and test sets using an 80-10-10 ratio. Oversampling is applied to underrepresented labels to balance class distributions. The model is trained for 30 epochs. Binary cross entropy is used as the loss function, and the Adam optimizer is used with a learning rate chosen through empirical evaluation. Training and validation losses are monitored throughout, and early stopping is employed to help reduce overfitting.

Results

The results demonstrate how well the two timbre-representation approaches, the PCA-based model and the deep Audio Spectrogram Transformer plus CRNN model, learned meaningful structure for our similarity search task. Evaluation includes training curves, qualitative inspection of nearest neighbors, and interpretation of timbral structures observed in the embedding spaces.

PCA Model Results

The PCA model provides a clean baseline for evaluating timbre representation. After dimensionality reduction, the first several principal components capture a substantial proportion of the variance in the engineered descriptor set. For example, spectral centroid, bandwidth, and

346

Made with FlippingBook - Share PDF online