M.S. AAI Capstone Chronicles 2024
A.S.LINGUIST
3
Conversations Dataset for ChatBot, n.d.) and was employed to fine-tune a Flan-T5-Base chatbot
model (google/flan-t5-base, n.d.).
Putting everything together, we created an application capable of interacting with the two
models. A webcam grabs the sign-language gestures from the user, which are then passed to the
CNN model to interpret the different inputs. These are combined together to form a string of text
feeding into the chatbot model, which in turn returns the response back for the user to see. As an
additional option, our application also has the ability to display the gestures from the chatbot’s
responses as a succession of ASL images. In shorter terms, this project can be thought of as an
interactive chatbot and a two way interpreter, from sign language to text and vice versa.
Data Summary
ASL Alphabet Dataset
The ASL alphabet dataset used to train the CNN model consists of 29 unique folders
containing images, each averaging about 12-13 KB in size. Of these, 26 folders correspond to the
letters of the US alphabet and the remaining three represent "space", "delete" and "nothing."
Figure 1 shows one example of image for each of the 29 possible classes in the training set. The
dataset appears balanced, with 3000 images per label and no missing data issues, that suggests no
major data collection biases.
The extracted variables from the ASL alphabet dataset, mean pixel intensity and standard
deviation, directly relate to the project goal of image classification for ASL signs. Mean pixel
intensity provides a measure of the overall brightness of an image, while standard deviation
measures the variability within the pixel values. As shown in Figure 2, the correlation analysis
from the ASL alphabet dataset revealed a weak positive correlation between mean pixel intensity
and standard deviation. This relationship tells us that, while there is some dependency, the
183
Made with FlippingBook - professional solution for displaying marketing and sales documents online