AAI_2025_Capstone_Chronicles_Combined

19

Qualitative nearest neighbor inspection shows that the deep model consistently retrieves timbre-related neighbors. Listening tests confirm that the model retrieves perceptually similar timbres, even across instruments, echoing the retrieval behavior illustrated in Figure 11.

FAISS Retrieval Performance

Similarity search through FAISS proves very efficient and scalable. The deep embedding generally provides more perceptually coherent neighbor sets, especially when filtering complex timbral properties. PCA retrieval tends to reflect broad categories, such as instrument class or dark vs bright qualities, but deep retrieval captures more nuanced relationships across evolving patterns from a given sound file. Comparing the two search-by-example methods, the advantage of learned representations for timbre similarity becomes clear. Deep embeddings return results that align more closely with how listeners perceive sound, rather than simply reflecting categorical tagging or metadata. This suggests that timbre retrieval is not just computable but learnable in ways that resonate with human hearing.

Discussion

Here we interpret the model’s performance, consider whether the results support the central hypothesis, identify unexpected outcomes, and relate the findings to real-world use cases. The system’s goal is to retrieve sounds that share perceptually meaningful timbral relationships, and the results suggest that both the PCA-based and deep embedding approaches contribute uniquely useful results.

351

Made with FlippingBook - Share PDF online