AAI_2025_Capstone_Chronicles_Combined

Table of Contents3
Spring 20253
Summer 20253
Fall 20253
Background Information13
Experimental Methods14
EfficientNet Model Training Methodology15
Hybrid CNN Model Training Methodology16
Multi-task vs Single Tasks Combination17
Results/Conclusion18
Appendix A24
References26
Draw, Detect, Navigate: Transforming Doodles into Actionable Navigation Plans and Beyond28
Abstract29
Introduction31
Background Information32
Data Summary35
Experimental Methods36
Results40
Conclusion43
References47
125
125
1.) Introduction126
2.) Background Information127
3.) Data Summary131
4.) Experimental Methods133
5.) Results136
5.1) Parametric Analysis136
5.1.1) GPU1 Swarm Sweep Results136
5.1.1.1) Small Network137
5.1.1.2) Large Network137
5.1.2) CPU2 Swarm Sweep Results138
5.1.2.1) Small Network138
139
5.1.2.2) Large Network139
5.2) MNIST Ensemble Model140
141
142
5.3) Time Series Stock Prediction143
6.) Conclusion144
7.) References145
We aim to identify behavioral and environmental factors associated with mental health conditions and use AI to predict individuals at risk of experiencing mental illness. By analyzing personal, social, and occupational attributes, such as work-related stress, isolation, or economic hardship, we aim to develop a supervised machine learning model that identifies early risk indicators to support timely intervention. The core question is: Can we build an AI model that accurately predicts individuals at risk of mental health conditions using survey-based attributes?203
Data Summary204
Background and Model Selection208
Experimental Methods211
Results and Conclusion217
References223
Introduction227
Data Summary229
Features and Data Characteristics229
Data Preparation230
Variable Relevance and Relationships for Clustering230
Background232
Existing Applications of Deep Clustering235
Experimental Methods235
DEC Model Overview235
LSTM-DEC Based Model236
LSTM-DEC Model Optimization237
Final Model Optimization237
Human Evaluation239
Alternative Models Overview239
Alternative Models Optimization240
Results240
Conclusions243
Appendix A: LLM Interpretation of DEC Clusters247
Appendix B: Faceting Example248
Appendix C: Prompt Library250
Introduction279
Dataset Summary & Exploratory Data Analysis (EDA)281
Background Study284
Experimental Methods286
Results289
References300
ABSTRACT304
KEYWORDS304
1 Introduction304
2 Data Summary305
3 Literature Review307
For preparation for model implementation, we created a data loading pipeline that each of our models will follow. This includes loading the same curated data splits for carefully balanced train, validation, and test sets. Out of our total sample size of 28,868 images, we created a split of 70% train (20,207 images), 15% validation (4,330 images), and 15% test (4,331 images). The models we chose to explore start with a baseline CNN model built from scratch, a DETR model, and a Faster R-CNN model.308
4.1​CNN308
To build a clear starting point for the cervical spine fracture detection task, we implemented a Simple Convolutional Neural Network. ​The model takes grayscale CT images that are resized to 256 by 256 pixels. These images pass through three convolutional layers, each followed by a rectified linear activation and a max pooling step. Together, these layers allow the model to learn important visual features such as edges, bone contours, and changes in texture that may indicate a fracture. As the image moves deeper through the network, the extracted features become more abstract and informative. After this feature extraction stage, the output is flattened and passed into two fully connected layers, with a dropout layer included to reduce overfitting. The final layer produces two values that correspond to the model’s confidence in predicting whether the image represents a normal cervical spine or one with a fracture.308
4.2​DETR308
We implement a pre-trained Detection Transformer (DETR) model for fracture detection. The model we selected has a ResNet-50 convolutional backbone followed by a transformer encoder-decoder architecture. We utilized a pre-trained version from the Hugging Face transformers library (Carion et al., 2020). Our images were preprocessed to normalize pixel intensities, which created 1-channel grayscale images. Since the pre-trained ResNet expects 3-channel RGB input images, we had to adapt the input pipeline to duplicate the grayscale channel three times. Since the CNN backbone was originally trained on the COCO dataset with 91 classes, we override the number of target classes to be 2 (Fracture or No-fracture).308
4.3​Faster R-CNN309
5 Results310
5.1​CNN310
5.2​Faster R-CNN311
5.3​DETR311
5.4​Model Results Comparison312
6​Conclusion313
ACKNOWLEDGMENTS313
KEYWORDS317
1 Introduction317
2 Data Summary319
322
3 Literature Review322
4 Methodology323
5 Results324
6​Conclusion329
ACKNOWLEDGMENTS330
333
333
SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval333
Kevin Pooler333
University of San Diego333
Master of Science in Applied Artificial Intelligence333
AAI 590: Applied AI Capstone333
Professor Anna Marbut333
December 8, 2025333
333
333
333
333
SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval334
Understanding timbre has long been a central challenge in psychoacoustics and digital audio research (Beauchamp, 2007). Foundational work, such as Beauchamp’s Analysis, Synthesis, and Perception of Musical Sounds, traces decades of work on decomposing sounds into meaningful components such as sinusoidal partials, temporal envelopes, transient structures, and noise layers. These components are foundational to traditional sound analysis, and used to represent how listeners perceive timbre. Classical approaches rely heavily on Fourier-based methods, including the short-time Fourier transform (STFT) and filterbank analyses (Beauchamp, 2007; Peeters et al., 2011). Existing tools support extracting timbral qualities such as brightness, which corresponds to the concentration of energy in higher frequency partials; sharpness, which relates to the weighted presence of upper harmonics; attack duration, which measures how quickly a sound reaches its peak amplitude; noisiness, which reflects the amount of stochastic or335
Dataset337
Exploratory Data Analysis339
Methods340
Feature Engineering341
Audio Spectrogram Transformer and CRNN Model343
Embedding Extraction344
Results346
Learned Embedding Space350
FAISS Retrieval Performance351
Interpretation of PCA Performance352
Interpretation of Deep Model Performance352
Unexpected Findings353
Implications for Real-World Use353
Future Work353
ABSTRACT398
KEYWORDS399
1 Introduction399
2 Data Summary400
3 Literature Review403
4 Methodology406
5 Results407
6​Conclusion409
Acknowledgments410

Made with FlippingBook - Share PDF online