2020-09-01: DNC vs RNC pulses - Quantifying news attention for the DNC & RNC with StoryGraph

Figure 1 (click figure to enlarge): Illustration of the level of attention given to the Democratic National Convention (DNC) story by news organization measured with StoryGraph's longitudinal data. The average degree or attention score (y-axis) of the connected components representing stories about the DNC story and other stories (e.g., Mail voting - green diamond) that occurred between August 17 and August 21, 2020 (x-axis). The four annotated peaks --- with average degree median 11.88 --- of the DNC story corresponds with discussions surrounding Michelle Obama, Joe Biden, Kamala Harris, and Joe Biden. The subscripts below the peaks represent the document frequencies of the bigram annotations.
Figure 1 (click figure to enlarge): Illustration of the level of attention given to the Democratic National Convention (DNC) story by news organization measured with StoryGraph's longitudinal data. The y-axis represents the average degree or attention score of the Connected Components (CC) which represents stories about the DNC story and other stories (e.g., Mail voting - green diamond) that occurred between August 17 and August 21, 2020 (x-axis). The four annotated peaks --- with average degree median 11.88 --- of the DNC story corresponds with discussions surrounding Michelle Obama, Joe Biden, Kamala Harris, and Joe Biden. The subscripts represent the document frequencies of the bigram annotations. For example ,"Michelle Obama" occurred 31 times in the text of the CC documents.

The Democratic Party held the Democratic National Convention (DNC) from August 17, 2020 to August 20, 2020. The following week, from August 24 to August 27, the Republican Party held the Republican National Convention (RNC). Ahead of the 2020 US Presidential Elections, the convention offered both parties the opportunity to grab national attention as they introduced their respective presidential candidates. 

It is customary for both parties to claim they had the better convention or had the higher ratings. Accordingly, there are multiple news reports that claim the Democrats had higher TV ratings from the Washington Post, NPR, and Forbes. Similar to the TV ratings method of quantifying attention for a particular TV event, we conducted a study to quantify the level of attention given to both conventions by news organizations by using StoryGraph. StoryGraph has been running for three years and is well-suited for this task since it extracts news text from 17 left, center, and right news organizations to generate news similarity graphs (every 10-minutes) that help quantify attention. If you are not already familiar with StoryGraph, before proceeding, read the following blogposts about quantifying attention (Tech Report) in 2018 and 2019.

Clustering StoryGraph's News Stories

Why cluster news stories?

Each StoryGraph news story is represented by a Connected Component (CC) consisting of multiple nodes (new articles). Since StoryGraph collects news articles every 10-minutes, the same news story is often a member of multiple different graphs. For example, the story about the Release of the Mueller Report on April 18, 2019 is present in 144 different news similarity graphs as 144 different connected components. Since we represent news stories as CCs, it would be desirable to cluster all 144 different connected components (new stories) as the same Release of Mueller Report story. This means the version of the story published at 11:28 AM EDT, its left neighbor published at 11:17 AM EDT, its right neighbor published at 11:39 AM EDT, and all remaining CCs have to be clustered as the same story as illustrated by Figure 2. 

Figure 2: Clustering involves grouping similar connected components from different graphs collected at different times under the same group label. This figure about the Release of the Mueller Report story, shows the clustering of different versions of the story published at different times
Figure 2 (click figure to enlarge): Clustering involves grouping similar connected components from different graphs collected at different times under the same group label. This figure about the Release of the Mueller Report story, shows the clustering of different versions of the story published at different times
(e.g., 11:28 AM EDT11:17 AM EDT, and 11:39 AM EDT).

How we clustered news stories (a highly summarized version)

We clustered news stories represented by their respective connected components by grouping connected components that share a common set of links and a common set of topics. Topics were represented by top Sumgrams (conjoined ngrams).

Quantifying news attention for both conventions: DNC vs RNC pulses - Methodology
Figure 3 (click figure to enlarge): Illustration of the level of attention given to the Republican National Convention (RNC) story by news organization measured with StoryGraph's longitudinal data. The y-axis represents the average degree or attention score of the connected components which represent stories about the RNC story and other stories (e.g., Hurricane Laura - black square) that occurred between August 24 and August 28, 2020 (x-axis). The four annotated peaks --- with average degree median 7.66 --- of the RNC story corresponds with discussions surrounding Nikki Haley, Melania Trump, Mike Pence, and President Trump. Unlike the DNC, the RNC shared the spotlight concurrently with news stories (Kenosha Protests/Shooting and Hurricane Laura). The subscripts represent the document frequencies of the bigram annotations. For example "Nikki Haley" occurred 12 times in the text of the CC documents.

To quantify the level of attention given to the DNC and RNC we took the following three steps. 

First, we extracted 720 graphs (719 for RNC) between August 17 and August 21, 2020 (August 24 and August 28, 2020 for RNC). Even though both conventions occurred within four days, we collected graphs for five days to account for the delay in reporting the news; news stories about an event can be reported after the occurrence of the event. For example, the dotted vertical lines in Figures 1 and 3 mark the end of the DNC (August 20, 2020) and RNC (August 27, 2020), but the peaks after the dotted vertical lines illustrates the delay in reporting the news. 

Second, we clustered all the connected components in all the graphs for the DNC and RNC into their respective stories. For example, for the DNC, 517 out of 720 graphs contained connected components that belonged to the Democratic National Convention story, for the Republican National Convention story it was 474 out of 719 graphs. Our clustering method is not without flaws. For example, giant clusters of news stories tend to attract noisy news stories. This could happen when news articles discuss multiple different topics, and thus, could possibly be assigned multiple story labels.

Third, we visualized the average degree of the connected components of the stories by plotting (Figures 1, 3, and 4) the average degree (y-axis) over time (x-axis). The average degree serves as our proxy for quantifying the level of attention given to new stories. The peaks of the figures were annotated with the most frequent PERSON entity (out of 10 ngrams). This was extracted by using Sumgram to extract conjoined ngrams from text extracted from the annotated connected components. For example, from Figure 1, the bigram "Michele Obama" occurred 31 times while "Nikki Haley" occurred 12 times in Figure 2.

Quantifying news attention for both conventions: DNC vs RNC pulses - Observations

Figures 1 and 3 illustrate the level of attention (average degree of CC) given to the DNC and RNC, respectively, by the 17 news organization from StoryGraph's longitudinal news dataset. According to StoryGraph, the DNC received more attention compared to the RNC. The DNC story features four prominent peaks in which "Michelle Obama," "Joe Biden," "Kamala Harris," and "Joe Biden," were the most frequent bigrams. The peaks and their respective top bigrams (with the exception of the first occurrence of "Joe Biden") corresponds with the order (Michelle Obama - Monday, Kamala Harris - Wednesday, Joe Biden - Thursday) of DNC primetime speakers.

Similar to the DNC story, the RNC story (Figure 2) features four prominent peaks in which "Nikki Haley," "Melania Trump," "Mike Pence," and "President Trump," which corresponds with the order (Nikki Haley - Monday, Melania Trump - Tuesday, Mike Pence - Wednesday, President Trump - Thursday) of RNC primetime speakers.

The DNC story did not share the spotlight with other stories such as the House's vote on funding for the USPSconcerns about Mail voting ahead of the general elections amidst a pandemic, and the senate hearing of Postmaster General Louis DeJoy. Unlike the DNC story, the RNC story (Figure 3) competed for attention with other stories reported concurrently such as, Kellyanne Conway's departure from the White House and George Conways withdrawal from the Lincoln Project, the protests and subsequent shooting in Kenosha, Wisconsin following the police shooting of Jacob Blake, and Hurricane Laura.

Figure 4 (click figure to enlarge): Combined plots for the Democratic (Figure 1) and Republican (Figure 3)  National Conventions with the time (x-axis) normalized (aligned by week) since both events occurred on different days. Superimposing both plots more easily illustrates that the DNC received more attention (blue line has taller peaks) than the RNC.

Figure 4 which superimposes the DNC and RNC plots on a normalized (Monday to Friday) timeline amplifies the already discussed findings. This third StoryGraph study aligns with our vision for adding context to graphs, where we do not see graphs as isolated entities but components of a continuum.

-- Alexander C. Nwala (@acnwala@storygraphbot)



Comments