AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 344 Next page Last page

Collectively, these capture patterns that classical descriptors may not fully learn, such as evolving textures, nonlinear spectral changes, and subtle envelope variations. Figure 6 illustrates our model architecture.

Figure 6

Deep learning architecture for learning a shared timbre embedding

Note: This fused embedding forms the learned timbre space that is later indexed with FAISS for search-by-example retrieval.

Embedding Extraction

The embedding extraction process produces two types of timbre vectors for each audio sample:

1. PCA-based embeddings , which are low dimensional, easily interpretable. 2. Deep learned embeddings , which are higher dimensional for nonlinear relationships.

344

Made with FlippingBook - Share PDF online