AAI_2025_Capstone_Chronicles_Combined

12

Collectively, these capture patterns that classical descriptors may not fully learn, such as evolving textures, nonlinear spectral changes, and subtle envelope variations. Figure 6 illustrates our model architecture.

Figure 6

Deep learning architecture for learning a shared timbre embedding

Note: This fused embedding forms the learned timbre space that is later indexed with FAISS for search-by-example retrieval.

Embedding Extraction

The embedding extraction process produces two types of timbre vectors for each audio sample:

1.​ PCA-based embeddings , which are low dimensional, easily interpretable. 2.​ Deep learned embeddings , which are higher dimensional for nonlinear relationships.

344

Made with FlippingBook - Share PDF online