View on GitHub

Trump In the News

Ryan Brady, Kathleen Zhen, Shirley Chew, Noa Shadmon

Question: How is Trump portrayed in different news sources?

To begin we split up popular news sites based on their overall political stance. The three different categories are:

Our Analysis

To answer our question, we web scraped the article titles from the 9 news sources’ politics page and/or opinion page over the span of this quarter. To see how we did it, check out our code. Once we web scraped the titles into a text file, we filtered out the titles based on the following key words:

whitehouse, trump, conway, sessions, pence, president, tillerson, devos, flynn, kushner, carson, department, preibus, bannon, spicer, and miller.

Once the titles were filtered, we wanted to see whether the more conservative news outlets used a positive tone in their titles about Donald Trump, whether the liberal news outlets used a negative tone when discussing Trump, and whether the news outlets that deem neither stance have neutral a tone.

To do this, we needed to use two essential Python packages:

  • NLTK
  • TextBlob
Both of these packages can be used for natural language processing analysis.

To see how we created all of the graphs below for our analysis, check out our code.

We used TextBlob to find the overall polarity (how positive/negative the words used are) and subjective the titles are. We predicted that the liberal news sources would be subjective and negative, the conservative news sources be subjective and positive, and the others be neutral and not subjective.

Our findings can be seen in the plots below. The first plot shows the polarity of the titles for each news source.

The liberal news outlets have a wide range of values for polarity, where most of the article titles are neutral, but overall seem to skew slightly towards positive article titles rather than to negative ones. The conservative news outlets also have many neutral titles, with slightly more positive article titles. This is especially true when we look at the boxplot for Breitbart, we can see that the majority of the boxplot range is positive. The "other" news sources seem to be more negative than positive, but they indeed have more neutral article titles.

The next plot showcases the subjectivity of each news outlet.

Each boxplot tells us how subjective the articles for each news source is, subjectivity is a number scaled from 0 to 1, where 0 is very objective and 1 is very subjective. The degree of subjectivity of article titles is important because it tells us how biased each of the websites are. From the boxplot, we can see that none of the news outlets are very subjective, but they have varying degrees of subjectivity. The least subjective news outlet is Wall Street Journal and the most subjective news outlets are Washington Post and Breitbart. Wall Street Journal was found to be more objective, and they also had more neutral and slightly negative article titles. Washington Post and Breitbart were found to be subjective, and they also have some of the more positive article titles in relation to the Trump administration. This does not completely match our assumption that The Washington Post would have more neutral to slightly negative news articles, because it is one of the more liberal news outlets (against Trump), while one of Breitbart’s old editors, is the current White House Chief Strategist (supporters).

After looking at the polarity and subjectivity of the article titles, we broke down the titles into noun phrases to further examine how the grouped news sources portrayed Trump and his administration. A noun phrase contains a noun and any words that modify it, this helps contextualize how Trump’s administration may be portrayed. Below is a table of percentage of polarity within noun phrases. The percentage was used because we had a various amount of noun phrases for different news source.


From the above table, we can see that the conservative news outlets has a higher positive than negative percentage which is expected. The other and liberal news source has a mix of negative and positive noun phrases but the percentages are pretty similar. Now we can see how the polarity for noun phrases perform in each news outlet category.


From the barplot above, conservative news outlets show consistency as their overall noun phrase polarity is more positive then negative. Other and liberal news outlets also show more positive noun phrase polarity than negative. An explanation for this is that noun phrases out of content with the whole article title may not show negativity or positivity accurately.



Comparing All Titles from News Sources

Since there did not seem to be a large difference between the polarity and subjectivity of the titles for the different kinds of news sources, we wanted to examine how similar the titles are from each other to see if Conservative news sources, Liberal news sources, and "Other" news sources report on similar topics. We applied pairwise comparisons of the article titles using the scikit-learn Python package. Our results are shown in the table below:


Comparing the titles for liberal, conservative, and other news sources for the most similar titles, the most similarities occur within groups rather than between groups. Liberal news sources share the most similar titles with other liberal news sources Conservative news sources never share a similar article title with liberal news sources. Although Conservative news sources and Liberal news sources are both mostly neutral to slightly positive when addressing the Trump administration, the way that the news is reported or the type of news reported may be vastly different. As shown in the table, conservative news sources and liberal news also share similar titles to "other" news sources. This implies that "other" news sources cover similar news to liberal and conservative news.



Word frequencies: What events each grouped news source frequently reports on

Breaking down the article title's one step further, we looked at the frequency of the words used. First, we removed the names related to Trump's Administration and stop words such as "the" and "that". Then using the Counter() function from the "collections" package will return the most commonly used words. We plotted the top 20 words used per category below:




At first sight, we see that conservative and other news outlets have more nouns than verbs. Liberal news outlets have more verbs which could mean the titles are referring to events or sayings that the Trump Administration is involved, whereas more nouns are firm statements of events and topics that involve the Trump Administration. Conservative news outlets like to cover topics about taxes and budget. Other news outlets top content revolves around healthcare. Liberal news outlet maintain a balance between talking about the budget and healthcare. Some common words across all three categories of news outlets are Obama, travel, ban.



The Media's Ideology and Its Perception of Trump

In this section we observe how the viewer base's ideology of different websites affect their reporting on Trump. In order to get ideology scores we webscraped the Pew Reports website. We obtained the tables first and then split the tables using regular expressions. After creating the data frames we used Pew's scoring ideology scoring criterion: [-8.5 for extreme left, -4.5 for moderate left, 0 for neutral, 4.5 for moderate right, and 8.5 for extreme left]. Since we did not have Pew's survey data we first summed the rows, creating a total percentage of people that trust the news source. These numbers will represent the amount of people who actually believe the news source. We then divided each indivual value from the summed row allowing us to see the percentage of the makeup of people who trust this source. We then multiplied these valus by their respective group in the data frame and then summed across to get that news paper's average viewer base. These values will not necessarily be the same as Pew's actual scores due to Pew having ranges for each value between -10 and 10. However; this data was not openly available and in our case we thought this would give us a rough estimate of Pew's actual scores. Next, we obtained the size of viewership of each news source by summing the original trust percentages. This gave us a value that would illustrate how many people have actually trust of the news source. We then used this value to represent the size of people who trust the website, and this variable is illustrated in the plots below as the size of the dot.

In this plot we look at Pew's Ideology Score versus how Trump is represented in the media. In the plot we see a slight trend in news corporation's positive sentiment towards Trump and their viewership ideology. The most obvious examples of this trend include the Economist and Breitbart. For this plot we transformed our sentiment variable by multiplying it by two and subtracting one in order to make positive sentiments positive and negate negative sentiments. While both are on opposite ends of political ideology, they both seemingly represent their audience's view of Trump. In the next plot we take a look at a random sample of the webpages articles sentiments against a random sample of a newspapers regular title sentiment.

In the plot above we look at the distribution of the random sampled sentiment and each news organization's sentiment towards Trump. This plot illustrates how most of the News sites have a slight negative sentiment towards Trump's administration than they have on average.

In this plot we subtract the average sentiment of website articles from their sentiment of Trump. We see that some wesites like the Washington Post and ABC News do not different strongly on their typical title rhetoric when talking about the current Executive Administration. However, the Economist, New York Times and Huffingoton Post all have covered the Trump Administration with increased scrutiny. On the right side of the political spectrum, Breitbart has reported on Trump with increased positivity. Fox on the other hand has actually reported on the current administration with slightly increased Scrutiny.

Confounding Variables

Our results are slightly different than what we expected. We thought there would be a huge difference in polarity levels between liberal and conservative news outlets, when in fact this was not the case.
This is due to two main reason.
  • Not enough data
  • Having a big enough sample size is usually very important when conducting data. While we did have a good amount of article titles, we did not web scrape enough from each news outlet which led to some biased results. The table below shows the number of titles scraped for each news source.


    We clearly did not scrape enough titles for The Economist and Breitbart. This is largely due to the fact that the front page of their politics section didn't have as many titles and the front page was not updated as often/had different articles each day.

  • Sarcastic Titles
  • While TextBlob and NLTK are great for tokenizing words and getting an overall sentiment analysis, they do not take account language and sarcasm. We found that some of the titles that were deemed as positive were actually negative. These articles had a sarcastic undertone that the packages could not decipher. An example of a title that was marked as positive but was actually poking fun at Trump is shown below.


    Clearly the second sentence is sarcastic and should be counted as negative but since we couldn't automate checking for sarcasm we left it as is.

Conclusion

In our project we illustrated how most news outlets title contain mostly neutral titles. The vast majority of all websites have similar neutral titles. However; we also saw sentiment splits in Conservative, Neutral, and Liberal news sites. Liberal sites tend to report on similar issues, likewise Conservative publications and Neutral publications create similar titles to other publications with similar ideological sentiments. Thus we see a slights split on reporting. When looking at subjectivity of the publications we found that most of the titles are objective with a few outliers. Finally we observed the contrast in Political Ideology of viewers and the publication’s coverage of Trump. From this we witnessed that there seems to be an upward trend in the relationship between ideology and sentiment towards Trump. Surprising though was Fox did not report on Trump favorably, whereas Breitbart covered Trump significantly more favorably than any other source. Thus we will need to obtain more data to conclude that Conservative sites report on Trump more favorably than Liberal ones. We can make note, that websites like Breitbart, recently corroborated by President Trump carry a much more positive sentiment reporting on the Executive Branch. We need more data and a better understanding of websites such as Drudge and Mother Jones to see the extremities of extreme partisan websites and their title sentiment towards Trump. What we can see is that major news sources report the news in a similar fashion, while showing slight favoritism to their viewers ideology.

BACK TO TOP