AAI_2025_Capstone_Chronicles_Combined
ResolveAI
LLM with RAG Results
The final step in building our app was evaluating the results of the chatbot portion based on OpenAI’s ChatGPT 3.5 turbo with a ChromaDB vector database for RAG. We focused on tuning one variable each for the vector database and the LLM itself. For the vector database we looked at the impact of top-K selection in serving the most relevant question and answer pairs to the model. For the LLM itself we evaluated the effect of model temperature. Both of these variables were evaluated through ROGUE scores comparing the generated model outputs against known correct answers in the test dataset.
When comparing the Rouge scores we placed more weight on the Rouge L scores as this gave an indication of how well our chatbot was matching the style of the answers in the test dataset. In general scores tended to be the highest for two sets of parameters: top-K=1, Temp = 0.0 and top-K=5, Temp = 0.7. Subjectively the latter values tended to give more natural sounding results and we elected to use a top-K retrieval value of 5 and temperature value of 0.7 for our final model which was then hosted on HuggingFace. Observations and Optimizations
22
70
Made with FlippingBook - Share PDF online