ADS Capstone Chronicles Revised

5

4.2 DataPre-processing

Table 2 Polarity by Station

The overall approach taken is based on text mining and natural language processing methodologies. Data processing includes folding to lowercase, removal of stopwords, removal of punctuation, and tokenization. In some applications, climate phrases were also removed for further analysis. 4.3 SentimentAnalysis Sentiment analysis was done by determining the polarity and subjectivity of each snippet using the Textblob API. Textblob polarity scoring is between -1.0 and 1.0, where -1.0 indicates negative sentiment and 1.0 indicates positive sentiment. Subjectivity is calculated as a float between 0.0 and 1.0, with 0.0 indicating very objective and 1.0 indicating very subjective. Overall, there is generally positive and objective reporting of climate change issues for all four stations (see Tables 1 and 2).

Station

Polarity Count

Proportion

BBC News positive 17533

0.77 0.23 0.79 0.21 0.75 0.25 0.79 0.21

negative

5160

CNN

positive 14729

negative

4011

FOX News positive 17767

negative

6070

MSNBC positive 20108

negative

5485

4.4 Exploratorydataanalysis

We calculate the frequency of phrases and plot the distributions of n-grams using a bag-of-words representation as it helps with classifying the news snippets’ main ideas. We apply the out-of-the-box algorithm provided in scikit-learn’s (n.d.) CountVectorizer class. We began by including all climate bigram phrases and stopwords as the dataset was built by identifying news articles containing the following bigram phrases: climate change, global warming, climate crisis, greenhouse gas, greenhouse gasses, and carbon tax. The frequency of the first term, climate change, dominates the distribution before and after stop words are removed (occurring 75,000 times) and becomes even more prevalent after their removal. Next, we removed the climate bigram phrases because we can ensure that each news snippet contains at least one of these phrases. By removing the climate bigrams, we uncovered more diverse phrases and obtained a more

Table 1 Subjectivity by Station Station Subjectivity Count BBC News objective 15160

Proportion

0.67 0.33 0.66 0.34 0.71 0.29 0.66 0.34

subjective 7533

CNN objective 12320

subjective 6420

FOX News objective 16938

subjective 6899

MSNBC objective 16959

subjective 8634

21

Made with FlippingBook - Online Brochure Maker