AAI_2025_Capstone_Chronicles_Combined
12
Collectively, these capture patterns that classical descriptors may not fully learn, such as evolving textures, nonlinear spectral changes, and subtle envelope variations. Figure 6 illustrates our model architecture.
Figure 6
Deep learning architecture for learning a shared timbre embedding
Note: This fused embedding forms the learned timbre space that is later indexed with FAISS for search-by-example retrieval.
Embedding Extraction
The embedding extraction process produces two types of timbre vectors for each audio sample:
1. PCA-based embeddings , which are low dimensional, easily interpretable. 2. Deep learned embeddings , which are higher dimensional for nonlinear relationships.
344
Made with FlippingBook - Share PDF online