AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 350 Next page Last page

Learned Embedding Space

The deep timbre embeddings reveal structure that is both complex and perceptually meaningful. Sustained, warm pad-like tones consistently populate one region of the learned space, while short, bright percussive attack tonalities show as distinct clusters elsewhere. Figure 11 visualizes this separation: after projecting 4,000 sounds into two dimensions and applying perceptual filtering of the aforementioned qualities, regions of the space become recognizable as coherent timbral neighborhoods rather than random scatterings. This indicates that the deep learning model organizes audio in ways that mirror how listeners perceive similarity.

Figure 11

2-D embedding of 4,000 audio clips with implemented perceptual filters

Note: The left view displays smooth, warm pad-like sounds, while the right emphasizes bright, attack-driven percussive tones.

350

Made with FlippingBook - Share PDF online