M.S. AAI Capstone Chronicles 2024

First page Table of contents Previous page 101 Next page Last page

ELECTRICITY DISTRIBUTION TOPOLOGY CLASSIFICATION

The transformer model processes input data with a shape of (1608, 7) or (168, 7) for each sample. 1608 represents the total measured hours of the dataset, whereas 168 represents the division from the total dataset to the weekly sub-dataset. 7 represents voltage max, min, and avg plus extra features: hour of day, day of month, month, and day of week. The model features a custom Transformer block comprising a Multi-Head Attention mechanism that allows the model to simultaneously focus on different parts of the input sequence and a Feed-Forward Network (FFN) consisting of two dense layers with LeakyReLU activation. Each Transformer block in the model includes a Multi-Head Attention mechanism designed to focus on different parts of the input sequence across several 'heads' (3 heads), enhancing the model's ability to capture various sequence relationships. The Feed-Forward Network (FFN) within each Transformer block comprises two layers of neurons: the first layer expands the dimensionality to 32, providing the model with the capacity to learn more complex features, and is followed by a LeakyReLU activation function, which helps maintain gradient flow during training, especially for small negative values. The second layer in the FFN compresses the data back down to the original embedding dimension 64. After processing through the Transformer blocks, the data is pooled using a Global Average Pooling layer that averages over the sequence's temporal dimension, effectively condensing the sequence information into a single vector per feature. This is followed by a dense layer with 20 neurons, introducing additional capacity for learning before the final classification layer. The final output layer consists of neurons equal to the number of classes (262), employing a softmax activation to output a probability distribution over the class labels. Throughout the model, dropout layers strategically reduce overfitting by randomly omitting a fraction of the neuron activations during training based on the specified rate. Layer

101

Made with FlippingBook - professional solution for displaying marketing and sales documents online