ADS Capstone Chronicles Revised
5
4.2 DataPre-processing
Table 2 Polarity by Station
The overall approach taken is based on text mining and natural language processing methodologies. Data processing includes folding to lowercase, removal of stopwords, removal of punctuation, and tokenization. In some applications, climate phrases were also removed for further analysis. 4.3 SentimentAnalysis Sentiment analysis was done by determining the polarity and subjectivity of each snippet using the Textblob API. Textblob polarity scoring is between -1.0 and 1.0, where -1.0 indicates negative sentiment and 1.0 indicates positive sentiment. Subjectivity is calculated as a float between 0.0 and 1.0, with 0.0 indicating very objective and 1.0 indicating very subjective. Overall, there is generally positive and objective reporting of climate change issues for all four stations (see Tables 1 and 2).
Station
Polarity Count
Proportion
BBC News positive 17533
0.77 0.23 0.79 0.21 0.75 0.25 0.79 0.21
negative
5160
CNN
positive 14729
negative
4011
FOX News positive 17767
negative
6070
MSNBC positive 20108
negative
5485
4.4 Exploratorydataanalysis
We calculate the frequency of phrases and plot the distributions of n-grams using a bag-of-words representation as it helps with classifying the news snippets’ main ideas. We apply the out-of-the-box algorithm provided in scikit-learn’s (n.d.) CountVectorizer class. We began by including all climate bigram phrases and stopwords as the dataset was built by identifying news articles containing the following bigram phrases: climate change, global warming, climate crisis, greenhouse gas, greenhouse gasses, and carbon tax. The frequency of the first term, climate change, dominates the distribution before and after stop words are removed (occurring 75,000 times) and becomes even more prevalent after their removal. Next, we removed the climate bigram phrases because we can ensure that each news snippet contains at least one of these phrases. By removing the climate bigrams, we uncovered more diverse phrases and obtained a more
Table 1 Subjectivity by Station Station Subjectivity Count BBC News objective 15160
Proportion
0.67 0.33 0.66 0.34 0.71 0.29 0.66 0.34
subjective 7533
CNN objective 12320
subjective 6420
FOX News objective 16938
subjective 6899
MSNBC objective 16959
subjective 8634
21
Made with FlippingBook - Online Brochure Maker