M.S. AAI Capstone Chronicles 2024
A.S.LINGUIST
9
generalized model that can be utilized in various conversational contexts beyond retail. Unlike
SignBot, which translates the video feed into potential comments or sentences for the user to
select, our project takes the sentences and feeds them into a natural language model to generate
textual responses, then converted into ASL images for users to visualize.
Machine Learning Methods
Choosing the right model architectures for both the sign language interpreter and the
chatbot to integrate in our final application was a crucial step during project development. The
reasons why we selected a CNN model for the sign language interpreter and fine-tuned a Flan
T5-Base model for the chatbot are provided in the following subsections.
Sign Language Interpreter
The CNN model seemed to us an appropriate choice for our sign sign language
interpreter due to its exceptional ability to handle image data, making it well-suited for the ASL
alphabet dataset. CNNs excel at hierarchical feature extraction through the use of convolutional
and pooling layers, allowing the model to learn intricate patterns and spatial dependencies in the
ASL images effectively. This hierarchical learning enables the model to identify essential
features, such as edges, shapes and textures, which are crucial for accurate sign recognition.
Another advantage of CNNs is the use of weight sharing in convolutional layers, which
significantly reduces the number of parameters compared to traditional fully connected neural
networks. This reduction in parameters makes CNNs computationally efficient.
Common drawbacks of CNNs, like its tendency to overfit, can usually be mitigated by
implementing data augmentation techniques, such as random rotation, shifting, shearing,
zooming, and flipping, in the ImageDataGenerator for the training set. Additionally, to enhance
189
Made with FlippingBook - professional solution for displaying marketing and sales documents online