AAI_2025_Capstone_Chronicles_Combined
6
delivering targeted feedback. The dataset highlights the inherent challenge that AI models face in maintaining mathematical accuracy within conversational contexts. Presented in JSON format, the dataset is well suited for training models to comprehend and categorize conversational patterns in an educational setting. The core variables in this dataset include:
• Expected Result : A binary categorical string ("Answer Accepted" or "Answer Not Accepted") indicating whether the tutor should correct a student’s mathematical claim.
• MathLevel : A categorical variable that describes the educational level of the math problem discussed within the dialogue.
• Data : Contains the complete conversational transcripts between the student and the tutor
• Test ID : A numerical variable serving as a unique identifier for each dialogue entry.
Data Preparation
The dataset does not contain missing or duplicate entries. However, inconsistent representations of mathematical expressions were observed due to varied user input styles and JSON encoding within the dialogues. Although this issue does not corrupt the integrity of the data, it can impact tokenization. To mitigate this risk, special characters were removed. Additionally, the conversation text underwent standard text preprocessing, including the removal of stop words and lower-casing all letters to normalize the text for exploratory data analysis. The conversation text was left in its original, unmodified state for model training due to the ability of modern embedding and large language models to effectively use the full context of the text. Variable Relevance and Relationships for Clustering For unsupervised clustering, the ’Expected Result’ variable was not directly relevant to our project goal, as it is intended for evaluating LLM accuracy, not for discovering intrinsic clusters in student behavior. The conversational transcripts contained within the ’Data’ variable served as the primary input for developing our clustering model.
230
Made with FlippingBook - Share PDF online