M.S. AAI Capstone Chronicles 2024
EDUCATION ASSISTANCE THROUGH A.I.
7
Our model attempts to address all these concerns in a similar but distinct way to the implementation by Dr. Tarshizi’s solution from OpenAI. Our model will also use RAG vector store but leverage metadata tagging to essentially create seven stores. The usage of the RAG stores is an attempt to reduce the accessibility of answers to assignment and quiz questions while still providing as much help as possible over the given course material. Our model will also have significant prompt engineering, which should allow it to formulate responses in a more apt manner. Finally, as we are using Llama2 as our baseline, we have the ability to fine-tune the model which should provide better results than just by leveraging a RAG vector store alone. The flexibility of what we can do with Llama2 is a significant advantage over the closed nature of the GPT 4.0 model. The biggest disadvantage is the paid OpenAI service runs on a much larger back-end server which allows their model to be more responsive and quite a bit larger. According to Buhl (2023) both training and fine-tuning are important to model development. In this case, Llama 2 was trained on a large mix of data from publicly available sources (Touvron et al., 2023). This means that in general, the model can answer a large amount of mixed questions, but the depth of the knowledge in any particular area might not be adequately deep. We have added three additional datasets to help overcome problems with the base model. We expand on this in the following section. Data Summary Our first of three datasets is comprised of class data provided by the Program Director of Applied Data Science, Ebrahim Tarshizi, PhD, MBA. This dataset contains various PDF files containing course material such as the syllabus, lectures, assignment questions, assignment solution keys, and the textbook. However, some documents were poorly formatted, leading to issues when processed with PDF readers like PyPDF (n.d.), displaying seemingly meaningless numbers and characters. Therefore, extensive manual data cleaning was necessary to rectify these formatting errors. Given the oddly formatted documents, data cleaning involved a lot of manual work
80
Made with FlippingBook - professional solution for displaying marketing and sales documents online