AAI_2025_Capstone_Chronicles_Combined
3
can retrieve sounds with shared perceptual qualities even when they differ in non-timbral qualities such as pitch or loudness. For example, a model should understand that a violin and a cello share more timbral characteristics than a cello and a flute, even though each instrument occupies a distinct musical role. Our project provides nearest neighbor sounds within an interactive 2d exploration grid, plus metadata and filtering capabilities for all audio files. Future iterations can extend this functionality to longer audio formats, more complex timbres, music files, all across multiple audio contexts. Understanding timbre has long been a central challenge in psychoacoustics and digital audio research (Beauchamp, 2007). Foundational work, such as Beauchamp’s Analysis, Synthesis, and Perception of Musical Sounds , traces decades of work on decomposing sounds into meaningful components such as sinusoidal partials, temporal envelopes, transient structures, and noise layers. These components are foundational to traditional sound analysis, and used to represent how listeners perceive timbre. Classical approaches rely heavily on Fourier-based methods, including the short-time Fourier transform (STFT) and filterbank analyses (Beauchamp, 2007; Peeters et al., 2011). Existing tools support extracting timbral qualities such as brightness, which corresponds to the concentration of energy in higher frequency partials; sharpness, which relates to the weighted presence of upper harmonics; attack duration, which measures how quickly a sound reaches its peak amplitude; noisiness, which reflects the amount of stochastic or non-harmonic energy; and spectral variability, which describes how the spectral content changes over time (Peeters et al., 2011). Despite the presence of these tools, researchers Background and Related Work
335
Made with FlippingBook - Share PDF online