M.S. AAI Capstone Chronicles 2024

A.S.LINGUIST

5

Figure 2

Correlation between mean pixel intensity and standard deviation for the images of the ASL

alphabet dataset

Conversations Dataset

The conversations dataset used to fine-tune the pre-trained chatbot model is stored as a

CSV file with two columns: one for questions and the other for answers, both containing data of

the string datatype. It includes 3725 questions and answers and no missing values.

As illustrated in the boxplots of Figure 3, there are 6.35±2.80 / 6.52±2.91 words and

31.26±13.95 / 32.21±14.54 characters in questions / answers. The number and type of the top 20

most common words in questions and answers are also shown in the bar plots of Figure 4. Since

we did not remove stopwords from the list of questions and answers, it is reasonable to find that

185

Made with FlippingBook - professional solution for displaying marketing and sales documents online