M.S. AAI Capstone Chronicles 2024
6
Table 1
Note: Table of Statistics for Stop Words by Label. This indicates that the extra granularity provided by inclusion of stop words could prove to have predictive power for the team’s models. This discovery was one of the strongest relationships identified during EDA that helps support limiting certain preprocessing steps to suit this specific problem. Due to this realization, it was decided to keep stop words during tokenization, such as punctuation and capitalization. Additional features, such as word count, perplexity, sentiment polarity, etc., were also engineered for further consideration, as limited features came with the chosen dataset. A list containing engineered features is displayed in Figure 2. Figure 2
Note: List Containing Engineered Features Background Information For this problem, the end goal is to create an application that can successfully and accurately differentiate between text generated by either human or AI. There have been several models and applications created to this day that can accomplish this with varied approaches.
56
Made with FlippingBook - professional solution for displaying marketing and sales documents online