M.S. AAI Capstone Chronicles 2024
A.S.LINGUIST
5
Figure 2
Correlation between mean pixel intensity and standard deviation for the images of the ASL
alphabet dataset
Conversations Dataset
The conversations dataset used to fine-tune the pre-trained chatbot model is stored as a
CSV file with two columns: one for questions and the other for answers, both containing data of
the string datatype. It includes 3725 questions and answers and no missing values.
As illustrated in the boxplots of Figure 3, there are 6.35±2.80 / 6.52±2.91 words and
31.26±13.95 / 32.21±14.54 characters in questions / answers. The number and type of the top 20
most common words in questions and answers are also shown in the bar plots of Figure 4. Since
we did not remove stopwords from the list of questions and answers, it is reasonable to find that
185
Made with FlippingBook - professional solution for displaying marketing and sales documents online