M.S. AAI Capstone Chronicles 2024

A.S.LINGUIST

13

Fine-Tuned Flan-T5-Base Chatbot Model

The Flan-T5-Base model can be used out of the box without fine-tuning but we chose to

fine-tune it with our conversations dataset by referring to the guide by Bhandare (n.d.), thereby

obtaining a model that adequately responds to our project needs.

Instead of following the traditional full fine-tuning approach, we prepared the model for

k-bit training using PEFT (Parameter-Efficient Fine-Tuning), a technique applicable to large

language models (Bhandare, n.d.). It allows one to modify just a small group of model

parameters instead of the entire model, with numerous advantages in terms of computational

cost, final model size and inference velocity. Among the different PEFT techniques, we applied

LoRA (Low-Rank Adaptation), which accelerates fine-tuning of large models while conserving

memory (Bhandare, n.d.). Given the model's size, these steps were proved to be crucial to

prevent system crashes. Thus, after creating the PEFT configuration, we wrapped the model in

PEFT and ran the trainer. We finally saved the tuned model, reloaded it into our environment,

and created a pipeline using the Hugging Face Transformers library to facilitate integration into

our chatbot system.

We used 80% and 10% of questions/answers for training and validation, respectively. We

imposed a learning rate of 1e-3, a total number of epochs of 5, and an initial batch size of 8,

which automatically lowers in case of out-of-memory issues. Each epoch was splitted into

different steps and the loss was the metric we imposed to check model performance during

training.

193

Made with FlippingBook - professional solution for displaying marketing and sales documents online