ADS Capstone Chronicles Revised

10

label. BERTScore considers the context in which words appear, unlike the ROUGE score, which provides a score based on word overlap and not word order. Finally, the BERTScore captures the thematic structures of the data and indicates better relationships between the topic labels. 6.1.5 LDA topic modeling Due to the data quality issues mentioned in section 6.1.1, we could not isolate climate-related topics from other political subjects. Over 65% of the validation set was unlabeled by the LDA model, leading to challenges when using it as a reference for BERTScore analysis with our LLM model. 6.1.6 Sentiment analysis Sentiment analysis of polarity and subjectivity revealed an overall positive and objective reporting of climate change issues. However, given that the text contains other unrelated subjects, the sentiment analysis for climate-related issues remains inconclusive. 6.1.7 Short Buffer Reduces Performance Between API Calls Our results suggest that shorter latencies between calling the GPT-3.5-turbo model via API call create a higher variability in the model’s output. We use a 3-second buffer when iteratively calling the GPT model. However, these results were less coherent than allowing 30 seconds or greater between API calls. GPT-3.5-turbo is a viable method for applying topic labels to news snippets. By identifying the high similarity between the topic labels, we can refine our approach to improve the overall coherence of generated topics. 6.2 Conclusion

isolate the climate sentences and remove sentences containing nonclimate subjects from the snippets. In addition, it would be worthwhile to build a climate-specific dataset from the same sources using a larger window to obtain the full context of the conversation as some climate discussions were cut short due to the window-size cutoff. Removing nonclimate subjects would be an important step for reperforming topic modeling in this scenario. Finally, as previously discussed, sentiment analysis was not viable due to the uncertain nature of the news snippets’ subjects. Therefore, we should reanalyze sentiment after we have removed the sentences unrelated to the climate phrases. REFERENCES American Association for the Advancement of Science. (2009, December 4). AAAS reaffirms statements on climate change and integrity . https://www.aaas.org/news/aaas-reaffirms-stat ements-climate-change-and-integrity Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, P., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Zielger, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020, July 22). Language models are few-shot learners. ArXiv . https://doi.org/10.48550/arXiv.2005.14165 Depoux, A., Hémono, M., Puig-Malet, S., Pédron, R., & Flahault, A. (2017). Communicating climate change and health in themedia. Public Health Reviews, 38, Article 7. https://publichealthreviews.biomedcentral.co m/articles/10.1186/s40985-016-0044-1

6.3 Recommended Next Steps

Many snippets contain multiple subjects. Therefore, it is imperative to devise a method to

26

Made with FlippingBook - Online Brochure Maker