| Table of Contents | 3 |
| Spring 2025 | 3 |
| Summer 2025 | 3 |
| Fall 2025 | 3 |
| Background Information | 13 |
| Experimental Methods | 14 |
| EfficientNet Model Training Methodology | 15 |
| Hybrid CNN Model Training Methodology | 16 |
| Multi-task vs Single Tasks Combination | 17 |
| Results/Conclusion | 18 |
| Appendix A | 24 |
| References | 26 |
| Draw, Detect, Navigate: Transforming Doodles into Actionable Navigation Plans and Beyond | 28 |
| Abstract | 29 |
| Introduction | 31 |
| Background Information | 32 |
| Data Summary | 35 |
| Experimental Methods | 36 |
| Results | 40 |
| Conclusion | 43 |
| References | 47 |
| 125 |
| 125 |
| 1.) Introduction | 126 |
| 2.) Background Information | 127 |
| 3.) Data Summary | 131 |
| 4.) Experimental Methods | 133 |
| 5.) Results | 136 |
| 5.1) Parametric Analysis | 136 |
| 5.1.1) GPU1 Swarm Sweep Results | 136 |
| 5.1.1.1) Small Network | 137 |
| 5.1.1.2) Large Network | 137 |
| 5.1.2) CPU2 Swarm Sweep Results | 138 |
| 5.1.2.1) Small Network | 138 |
| 139 |
| 5.1.2.2) Large Network | 139 |
| 5.2) MNIST Ensemble Model | 140 |
| 141 |
| 142 |
| 5.3) Time Series Stock Prediction | 143 |
| 6.) Conclusion | 144 |
| 7.) References | 145 |
| We aim to identify behavioral and environmental factors associated with mental health conditions and use AI to predict individuals at risk of experiencing mental illness. By analyzing personal, social, and occupational attributes, such as work-related stress, isolation, or economic hardship, we aim to develop a supervised machine learning model that identifies early risk indicators to support timely intervention. The core question is: Can we build an AI model that accurately predicts individuals at risk of mental health conditions using survey-based attributes? | 203 |
| Data Summary | 204 |
| Background and Model Selection | 208 |
| Experimental Methods | 211 |
| Results and Conclusion | 217 |
| References | 223 |
| Introduction | 227 |
| Data Summary | 229 |
| Features and Data Characteristics | 229 |
| Data Preparation | 230 |
| Variable Relevance and Relationships for Clustering | 230 |
| Background | 232 |
| Existing Applications of Deep Clustering | 235 |
| Experimental Methods | 235 |
| DEC Model Overview | 235 |
| LSTM-DEC Based Model | 236 |
| LSTM-DEC Model Optimization | 237 |
| Final Model Optimization | 237 |
| Human Evaluation | 239 |
| Alternative Models Overview | 239 |
| Alternative Models Optimization | 240 |
| Results | 240 |
| Conclusions | 243 |
| Appendix A: LLM Interpretation of DEC Clusters | 247 |
| Appendix B: Faceting Example | 248 |
| Appendix C: Prompt Library | 250 |
| Introduction | 279 |
| Dataset Summary & Exploratory Data Analysis (EDA) | 281 |
| Background Study | 284 |
| Experimental Methods | 286 |
| Results | 289 |
| References | 300 |
| ABSTRACT | 304 |
| KEYWORDS | 304 |
| 1 Introduction | 304 |
| 2 Data Summary | 305 |
| 3 Literature Review | 307 |
| For preparation for model implementation, we created a data loading pipeline that each of our models will follow. This includes loading the same curated data splits for carefully balanced train, validation, and test sets. Out of our total sample size of 28,868 images, we created a split of 70% train (20,207 images), 15% validation (4,330 images), and 15% test (4,331 images). The models we chose to explore start with a baseline CNN model built from scratch, a DETR model, and a Faster R-CNN model. | 308 |
| 4.1CNN | 308 |
| To build a clear starting point for the cervical spine fracture detection task, we implemented a Simple Convolutional Neural Network. The model takes grayscale CT images that are resized to 256 by 256 pixels. These images pass through three convolutional layers, each followed by a rectified linear activation and a max pooling step. Together, these layers allow the model to learn important visual features such as edges, bone contours, and changes in texture that may indicate a fracture. As the image moves deeper through the network, the extracted features become more abstract and informative. After this feature extraction stage, the output is flattened and passed into two fully connected layers, with a dropout layer included to reduce overfitting. The final layer produces two values that correspond to the model’s confidence in predicting whether the image represents a normal cervical spine or one with a fracture. | 308 |
| 4.2DETR | 308 |
| We implement a pre-trained Detection Transformer (DETR) model for fracture detection. The model we selected has a ResNet-50 convolutional backbone followed by a transformer encoder-decoder architecture. We utilized a pre-trained version from the Hugging Face transformers library (Carion et al., 2020). Our images were preprocessed to normalize pixel intensities, which created 1-channel grayscale images. Since the pre-trained ResNet expects 3-channel RGB input images, we had to adapt the input pipeline to duplicate the grayscale channel three times. Since the CNN backbone was originally trained on the COCO dataset with 91 classes, we override the number of target classes to be 2 (Fracture or No-fracture). | 308 |
| 4.3Faster R-CNN | 309 |
| 5 Results | 310 |
| 5.1CNN | 310 |
| 5.2Faster R-CNN | 311 |
| 5.3DETR | 311 |
| 5.4Model Results Comparison | 312 |
| 6Conclusion | 313 |
| ACKNOWLEDGMENTS | 313 |
| KEYWORDS | 317 |
| 1 Introduction | 317 |
| 2 Data Summary | 319 |
| 322 |
| 3 Literature Review | 322 |
| 4 Methodology | 323 |
| 5 Results | 324 |
| 6Conclusion | 329 |
| ACKNOWLEDGMENTS | 330 |
| 333 |
| 333 |
| SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval | 333 |
| Kevin Pooler | 333 |
| University of San Diego | 333 |
| Master of Science in Applied Artificial Intelligence | 333 |
| AAI 590: Applied AI Capstone | 333 |
| Professor Anna Marbut | 333 |
| December 8, 2025 | 333 |
| 333 |
| 333 |
| 333 |
| 333 |
| SoundSearch: A Machine Learning System for Timbre Based Audio Retrieval | 334 |
| Understanding timbre has long been a central challenge in psychoacoustics and digital audio research (Beauchamp, 2007). Foundational work, such as Beauchamp’s Analysis, Synthesis, and Perception of Musical Sounds, traces decades of work on decomposing sounds into meaningful components such as sinusoidal partials, temporal envelopes, transient structures, and noise layers. These components are foundational to traditional sound analysis, and used to represent how listeners perceive timbre. Classical approaches rely heavily on Fourier-based methods, including the short-time Fourier transform (STFT) and filterbank analyses (Beauchamp, 2007; Peeters et al., 2011). Existing tools support extracting timbral qualities such as brightness, which corresponds to the concentration of energy in higher frequency partials; sharpness, which relates to the weighted presence of upper harmonics; attack duration, which measures how quickly a sound reaches its peak amplitude; noisiness, which reflects the amount of stochastic or | 335 |
| Dataset | 337 |
| Exploratory Data Analysis | 339 |
| Methods | 340 |
| Feature Engineering | 341 |
| Audio Spectrogram Transformer and CRNN Model | 343 |
| Embedding Extraction | 344 |
| Results | 346 |
| Learned Embedding Space | 350 |
| FAISS Retrieval Performance | 351 |
| Interpretation of PCA Performance | 352 |
| Interpretation of Deep Model Performance | 352 |
| Unexpected Findings | 353 |
| Implications for Real-World Use | 353 |
| Future Work | 353 |
| ABSTRACT | 398 |
| KEYWORDS | 399 |
| 1 Introduction | 399 |
| 2 Data Summary | 400 |
| 3 Literature Review | 403 |
| 4 Methodology | 406 |
| 5 Results | 407 |
| 6Conclusion | 409 |
| Acknowledgments | 410 |