AAI_2025_Capstone_Chronicles_Combined

2

SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval

How do we find a specific sound in a sea of unlabeled audio? This work addresses that problem as a search-by-example task using machine learning (ML). When a user provides an example sound, the system retrieves other sounds that exhibit the closest timbral resemblance. Timbre, broadly understood as a set of auditory perceptual attributes that allow listeners to distinguish one sound from another beyond pitch and amplitude, is central to this problem yet remains difficult to define precisely (Beauchamp, 2007). Further, timbral search becomes challenging as it resides in higher dimensional acoustic representations, particularly for abstract perceptual and acoustic qualities (Peeters et al., 2011). For audio professionals and creatives who work with timbre, this ML project provides a path toward fast and intuitive sound retrieval within their workflows. Today, composers, producers, and sound designers often spend ample time auditioning thousands of unorganized and untagged audio files across numerous folders. This has been recognized in Music Information Retrieval (MIR) research as inefficient and disruptive to creative flow (Humphrey et al., 2012). Our ML system aims to streamline this by using retrieval methods similar in spirit to Shazam’s example-based search, although our objective focuses on timbral similarity rather than exact sonic identification (Wang, 2003). By learning characteristics of timbre directly from audio data, the system can provide close matches quickly in ways traditional tags or labels can not. Although similar timbres are difficult to quantify, audio contains spectral information such as frequency, amplitude, and phase relationships, along with evolving temporal patterns that ML systems can learn to represent and compare. A similarity engine that captures these patterns

334

Made with FlippingBook - Share PDF online