M.S. AAI Capstone Chronicles 2024
A.S.LINGUIST
11
Flan-T5-Base, published by Google researchers in 2022, is an improved version of T5 and able
to perform 1000 additional tasks with the same number of parameters (google/flan-t5-base, n.d.).
It is an encoder-decoder model, suitable for applications like chat/dialogue summarization
(Keita, 2023), which is in line with our project goal to provide users with quick answers to
general questions through a chatbot.
Thus, considering the numerous advantages of using a pre-trained model, the good results
obtained by Bhandare (n.d.) with the Flan-T5-Base model and our conversations dataset, and its
ease of implementation, we decided to fine-tune the Flan-T5-Base model to create our chatbot.
Detailed Experimental Methods
CNN Model
The CNN model consists of one input layer, four convolutional layers and three dense
layers, with the addition of one output layer. The number of filters in the convolutional layers
increases from 16 close to the input layer to 128 close to the first dense layer, while the number
of units reduces from 512 in the first dense layer to 128 in the last one before the output layer.
Thus, we imposed an ascending number of filters and a descending number of units moving
forward in the convolutional and dense layers, respectively. While the ascending number of
filters in the convolutional layers allows the model to progressively capture more complex
patterns starting from raw pixel data, the descending number of units in the dense layers leads to
a gradual removal of the least important image features before reaching the last dense layer. For
both convolutional layers and dense layers a “relu” activation function was adopted, while for
the output layer a “softmax” function was selected. The “relu” activation function is widely used
in CNNs as it introduces non linearity, thereby allowing the model to learn complex relationships
191
Made with FlippingBook - professional solution for displaying marketing and sales documents online