AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Next page Last page

Table of Contents	3
Spring 2025	3
Summer 2025	3
Fall 2025	3
Background Information	13
Experimental Methods	14
EfficientNet Model Training Methodology	15
Hybrid CNN Model Training Methodology	16
Multi-task vs Single Tasks Combination	17
Results/Conclusion	18
Appendix A	24
References	26
Draw, Detect, Navigate: Transforming Doodles into Actionable Navigation Plans and Beyond	28
Abstract	29
Introduction	31
Background Information	32
Data Summary	35
Experimental Methods	36
Results	40
Conclusion	43
References	47
	125
	125
1.) Introduction	126
2.) Background Information	127
3.) Data Summary	131
4.) Experimental Methods	133
5.) Results	136
5.1) Parametric Analysis	136
5.1.1) GPU1 Swarm Sweep Results	136
5.1.1.1) Small Network	137
5.1.1.2) Large Network	137
5.1.2) CPU2 Swarm Sweep Results	138
5.1.2.1) Small Network	138
	139
5.1.2.2) Large Network	139
5.2) MNIST Ensemble Model	140
	141
	142
5.3) Time Series Stock Prediction	143
6.) Conclusion	144
7.) References	145
We aim to identify behavioral and environmental factors associated with mental health conditions and use AI to predict individuals at risk of experiencing mental illness. By analyzing personal, social, and occupational attributes, such as work-related stress, isolation, or economic hardship, we aim to develop a supervised machine learning model that identifies early risk indicators to support timely intervention. The core question is: Can we build an AI model that accurately predicts individuals at risk of mental health conditions using survey-based attributes?	203
Data Summary	204
Background and Model Selection	208
Experimental Methods	211
Results and Conclusion	217
References	223
Introduction	227
Data Summary	229
Features and Data Characteristics	229
Data Preparation	230
Variable Relevance and Relationships for Clustering	230
Background	232
Existing Applications of Deep Clustering	235
Experimental Methods	235
DEC Model Overview	235
LSTM-DEC Based Model	236
LSTM-DEC Model Optimization	237
Final Model Optimization	237
Human Evaluation	239
Alternative Models Overview	239
Alternative Models Optimization	240
Results	240
Conclusions	243
Appendix A: LLM Interpretation of DEC Clusters	247
Appendix B: Faceting Example	248
Appendix C: Prompt Library	250
Introduction	279
Dataset Summary & Exploratory Data Analysis (EDA)	281
Background Study	284
Experimental Methods	286
Results	289
References	300
ABSTRACT	304
KEYWORDS	304
1 Introduction	304
2 Data Summary	305
3 Literature Review	307
For preparation for model implementation, we created a data loading pipeline that each of our models will follow. This includes loading the same curated data splits for carefully balanced train, validation, and test sets. Out of our total sample size of 28,868 images, we created a split of 70% train (20,207 images), 15% validation (4,330 images), and 15% test (4,331 images). The models we chose to explore start with a baseline CNN model built from scratch, a DETR model, and a Faster R-CNN model.	308
4.1CNN	308
To build a clear starting point for the cervical spine fracture detection task, we implemented a Simple Convolutional Neural Network. The model takes grayscale CT images that are resized to 256 by 256 pixels. These images pass through three convolutional layers, each followed by a rectified linear activation and a max pooling step. Together, these layers allow the model to learn important visual features such as edges, bone contours, and changes in texture that may indicate a fracture. As the image moves deeper through the network, the extracted features become more abstract and informative. After this feature extraction stage, the output is flattened and passed into two fully connected layers, with a dropout layer included to reduce overfitting. The final layer produces two values that correspond to the model’s confidence in predicting whether the image represents a normal cervical spine or one with a fracture.	308
4.2DETR	308
We implement a pre-trained Detection Transformer (DETR) model for fracture detection. The model we selected has a ResNet-50 convolutional backbone followed by a transformer encoder-decoder architecture. We utilized a pre-trained version from the Hugging Face transformers library (Carion et al., 2020). Our images were preprocessed to normalize pixel intensities, which created 1-channel grayscale images. Since the pre-trained ResNet expects 3-channel RGB input images, we had to adapt the input pipeline to duplicate the grayscale channel three times. Since the CNN backbone was originally trained on the COCO dataset with 91 classes, we override the number of target classes to be 2 (Fracture or No-fracture).	308
4.3Faster R-CNN	309
5 Results	310
5.1CNN	310
5.2Faster R-CNN	311
5.3DETR	311
5.4Model Results Comparison	312
6Conclusion	313
ACKNOWLEDGMENTS	313
KEYWORDS	317
1 Introduction	317
2 Data Summary	319
	322
3 Literature Review	322
4 Methodology	323
5 Results	324
6Conclusion	329
ACKNOWLEDGMENTS	330
	333
	333
SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval	333
Kevin Pooler	333
University of San Diego	333
Master of Science in Applied Artificial Intelligence	333
AAI 590: Applied AI Capstone	333
Professor Anna Marbut	333
December 8, 2025	333
	333
	333
	333
	333
SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval	334
Understanding timbre has long been a central challenge in psychoacoustics and digital audio research (Beauchamp, 2007). Foundational work, such as Beauchamp’s Analysis, Synthesis, and Perception of Musical Sounds, traces decades of work on decomposing sounds into meaningful components such as sinusoidal partials, temporal envelopes, transient structures, and noise layers. These components are foundational to traditional sound analysis, and used to represent how listeners perceive timbre. Classical approaches rely heavily on Fourier-based methods, including the short-time Fourier transform (STFT) and filterbank analyses (Beauchamp, 2007; Peeters et al., 2011). Existing tools support extracting timbral qualities such as brightness, which corresponds to the concentration of energy in higher frequency partials; sharpness, which relates to the weighted presence of upper harmonics; attack duration, which measures how quickly a sound reaches its peak amplitude; noisiness, which reflects the amount of stochastic or	335
Dataset	337
Exploratory Data Analysis	339
Methods	340
Feature Engineering	341
Audio Spectrogram Transformer and CRNN Model	343
Embedding Extraction	344
Results	346
Learned Embedding Space	350
FAISS Retrieval Performance	351
Interpretation of PCA Performance	352
Interpretation of Deep Model Performance	352
Unexpected Findings	353
Implications for Real-World Use	353
Future Work	353
ABSTRACT	398
KEYWORDS	399
1 Introduction	399
2 Data Summary	400
3 Literature Review	403
4 Methodology	406
5 Results	407
6Conclusion	409
Acknowledgments	410

Made with FlippingBook - Share PDF online