AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 352 Next page Last page

Interpretation of PCA Performance

The PCA model demonstrates that classical MIR descriptors encode a substantial amount of timbral structure. Features related to brightness, noisiness, and envelope behavior map cleanly onto principal components, which supports the idea that traditional spectral and temporal descriptors capture primary axes of timbre variation. PCA also performs well for categories that involve broad spectral balance or simple decay characteristics. However, PCA shows limitations. It reduces descriptors in a linear fashion, which may not align with perceptual distinctions that involve subtle textural variations, evolving harmonics, or nonlinear amplitude behavior. These limitations confirm the hypothesis that handcrafted descriptors alone cannot represent the full richness of timbre. The deep model produces more expressive embeddings that reflect a wider variety of timbral cues. The learned representations capture relationships across frequency and time that are difficult to encode otherwise. The transformer architecture identifies long-range interactions, while the CRNN captures local time-frequency patterns. This combination leads to perceptually coherent retrieval behavior. The model performs well for common descriptors but struggles with rare classes. This outcome suggests that additional training strategies, such as balanced sampling or alternative loss functions, may improve multi-label performance. Despite this, the embedding consistently retrieves sounds that listeners would consider timbrally similar, which supports the hypothesis that deep models can capture perceptual structure relevant to creative tasks. Interpretation of Deep Model Performance

352

Made with FlippingBook - Share PDF online