M.S. AAI Capstone Chronicles 2024

A.S.LINGUIST

9

generalized model that can be utilized in various conversational contexts beyond retail. Unlike

SignBot, which translates the video feed into potential comments or sentences for the user to

select, our project takes the sentences and feeds them into a natural language model to generate

textual responses, then converted into ASL images for users to visualize.

Machine Learning Methods

Choosing the right model architectures for both the sign language interpreter and the

chatbot to integrate in our final application was a crucial step during project development. The

reasons why we selected a CNN model for the sign language interpreter and fine-tuned a Flan

T5-Base model for the chatbot are provided in the following subsections.

Sign Language Interpreter

The CNN model seemed to us an appropriate choice for our sign sign language

interpreter due to its exceptional ability to handle image data, making it well-suited for the ASL

alphabet dataset. CNNs excel at hierarchical feature extraction through the use of convolutional

and pooling layers, allowing the model to learn intricate patterns and spatial dependencies in the

ASL images effectively. This hierarchical learning enables the model to identify essential

features, such as edges, shapes and textures, which are crucial for accurate sign recognition.

Another advantage of CNNs is the use of weight sharing in convolutional layers, which

significantly reduces the number of parameters compared to traditional fully connected neural

networks. This reduction in parameters makes CNNs computationally efficient.

Common drawbacks of CNNs, like its tendency to overfit, can usually be mitigated by

implementing data augmentation techniques, such as random rotation, shifting, shearing,

zooming, and flipping, in the ImageDataGenerator for the training set. Additionally, to enhance

189

Made with FlippingBook - professional solution for displaying marketing and sales documents online