M.S. AAI Capstone Chronicles 2024
A.S.LINGUIST
13
Fine-Tuned Flan-T5-Base Chatbot Model
The Flan-T5-Base model can be used out of the box without fine-tuning but we chose to
fine-tune it with our conversations dataset by referring to the guide by Bhandare (n.d.), thereby
obtaining a model that adequately responds to our project needs.
Instead of following the traditional full fine-tuning approach, we prepared the model for
k-bit training using PEFT (Parameter-Efficient Fine-Tuning), a technique applicable to large
language models (Bhandare, n.d.). It allows one to modify just a small group of model
parameters instead of the entire model, with numerous advantages in terms of computational
cost, final model size and inference velocity. Among the different PEFT techniques, we applied
LoRA (Low-Rank Adaptation), which accelerates fine-tuning of large models while conserving
memory (Bhandare, n.d.). Given the model's size, these steps were proved to be crucial to
prevent system crashes. Thus, after creating the PEFT configuration, we wrapped the model in
PEFT and ran the trainer. We finally saved the tuned model, reloaded it into our environment,
and created a pipeline using the Hugging Face Transformers library to facilitate integration into
our chatbot system.
We used 80% and 10% of questions/answers for training and validation, respectively. We
imposed a learning rate of 1e-3, a total number of epochs of 5, and an initial batch size of 8,
which automatically lowers in case of out-of-memory issues. Each epoch was splitted into
different steps and the loss was the metric we imposed to check model performance during
training.
193
Made with FlippingBook - professional solution for displaying marketing and sales documents online