M.S. AAI Capstone Chronicles 2024

A.S.LINGUIST

11

Flan-T5-Base, published by Google researchers in 2022, is an improved version of T5 and able

to perform 1000 additional tasks with the same number of parameters (google/flan-t5-base, n.d.).

It is an encoder-decoder model, suitable for applications like chat/dialogue summarization

(Keita, 2023), which is in line with our project goal to provide users with quick answers to

general questions through a chatbot.

Thus, considering the numerous advantages of using a pre-trained model, the good results

obtained by Bhandare (n.d.) with the Flan-T5-Base model and our conversations dataset, and its

ease of implementation, we decided to fine-tune the Flan-T5-Base model to create our chatbot.

Detailed Experimental Methods

CNN Model

The CNN model consists of one input layer, four convolutional layers and three dense

layers, with the addition of one output layer. The number of filters in the convolutional layers

increases from 16 close to the input layer to 128 close to the first dense layer, while the number

of units reduces from 512 in the first dense layer to 128 in the last one before the output layer.

Thus, we imposed an ascending number of filters and a descending number of units moving

forward in the convolutional and dense layers, respectively. While the ascending number of

filters in the convolutional layers allows the model to progressively capture more complex

patterns starting from raw pixel data, the descending number of units in the dense layers leads to

a gradual removal of the least important image features before reaching the last dense layer. For

both convolutional layers and dense layers a “relu” activation function was adopted, while for

the output layer a “softmax” function was selected. The “relu” activation function is widely used

in CNNs as it introduces non linearity, thereby allowing the model to learn complex relationships

191

Made with FlippingBook - professional solution for displaying marketing and sales documents online