2020-01-04: 365 dots in 2019 - top news stories of 2019

Fig. 1 (Click on image to expand) 365 dots in 2019 - News stories for 365 days in 2019. Each dot represents the average degree of the Giant Connected Component (GCC) with the largest average degree across all the 144 story graphs for a given day. The x-axis represents time, the y-axis represents the average degree of the GCC.
In March 2019 I published "365 dots in 2018" where I presented the top stories for each day in 2018 according to StoryGraph. Now that 2019 is over, it is natural to ask what were the top news stories of 2019? News organizations will often publish "the year's top stories" or "year in review" (e.g., CNN, CBS, FoxNews), but the selection criteria is not always made explicit. The closest to a selection criteria I have seen from news organizations is the presentation of their top most viewed (or most popular) news stories. But this criteria is not accessible to ordinary users who cannot access the private traffic statistics of news articles. As I mentioned previously, we consider specifying the selection criteria important for two reasons. First, an explanation or presentation of the criteria opens the criteria to critique and helps alleviate concerns of bias. Second, the criteria is inherently valuable because it could be reused and reapplied on a different collection. For example, one could apply the process to find out the top news stories in a different country.

StoryGraph's criteria for a "top story" is a high average degree of a connected component generated by computing the similarity between entities by processing news articles extracted from the RSS feeds of 17 US news sources across the partisanship spectrum (left, center, and right). The code is available and the algorithm is described elsewhere. But essentially, the more various news organizations use the same entities (e.g., people, locations, organizations) in their reporting, the more important the story. A graph is constructed with the nodes as news articles represented by the entities extracted from the news articles, and the edge between the news articles represents a high degree of similarity between the news articles. A connected component represents a news story (e.g., Mueller Report), and the average degree of the graph's connected components (GCC avg. deg.) is the attention score.

The top news stories of 2019
The table below shows that the top 10 news stories (extracted from Fig. 1) of 2019 were clustered around three primary stories:

  1. The Mueller Report (Ranks 1st, 4th, 6th, and 8th),
  2. The impeachment inquiry against President Trump (Ranks 2nd and 3rd), and 
  3. The 2019 Democratic debates (Ranks 5th, and 7th).

Rank Date (MM-DD) News Story GCC Avg. Deg
1 03-24 AG William Barr releases Mueller Report's principal conclusions 22.93
2 09-24 House Speaker Pelosi announces formal impeachment inquiry 18.60
3 11-19/20 Impeachment inquiry public testimony (Tie: 11-19, 11-20) 18.18
4 01-19 Mueller: BuzzFeed Report 'Not Accurate' 17.19
5 07-31 Second Democratic debates 15.39
6 07-24 Robert Mueller's testimony at congress 15.05
7 09-13 Third Democratic debates 14.37
8 05-01 AG Barr and Robert Mueller split on obstruction 14.36
9 04-08 Homeland Security Chief Kirstjen Nielsen resigns 13.43
10 12-20 Sixth Democratic debates 13.33

Stories surrounding the release of the Mueller Report (red dots in Fig. 1) received the most attention in 2019. On March 22, 2019, Robert Mueller submitted his report to AG William Barr (GCC avg. degree: 18.72). Two days later, AG William Barr released his summary (principal conclusions) of the report. This story received the most attention (GCC avg. degree: 22.93) in 2019. AG William Barr's principal conclusions of the Mueller report was received with skepticism by the Democrats who claimed the conclusions were highly favorable to President Trump. In contrast, the Republicans claimed the summary exonerated the President from any wrongdoing.
The next top story in 2019 (blue dots in Fig. 1) with GCC average degree of 18.60 was Speaker Nancy Pelosi's announcement of an official impeachment inquiry (September 24, 2019) four days after the whistleblower's report. Similarly, at rank three (green dots in Fig. 1) were stories chronicling the public testimonies of the impeachment inquiry.
Similar to 2018, President Trump was a dominant figure in the 2019 news discourse. As shown in Fig. 1, out of the 365 days, "Trump" was included in the title representing the story graphs 193 (~52%) times (vs. 54% in 2018).

Fig. 1 consists of 365 dots. Each dot represents a single news graph out of 144 candidates. A dot represents the connected component with the highest average degree for that day. Since we select only one connected component (out of 144) — and indeed this is needed to avoid plotting 52,560 (144 x 365) dots — we lose so much information (news stories) for the sake of compression. The need for a method of summarizing the news of the year without discarding too many news articles led me to apply sumgram to summarize the news of 2019.

60 Sumgrams in 2019
Fig. 2: Summary of the top news stories in 2019 according to sumgram. List of five top sumgrams (n = 2) generated for each month in 2019 from the 2019 StoryGraph dataset. The red text highlights the base ngrams. Key: # - Rank, DF - Document Frequency, DFR - Document Frequency Rate
Fig. 2 consists of the list of top five sumgrams generated by processing the StoryGraph 2019 news dataset with base_ngram = 2, and removal of these stop words: "2019 read, abc news, apr 2019, april 2019, associated press, aug 2019, august 2019, com, dec 2019, december 2019, donald trump, feb 2019, february 2019, fox news, getty images, jan 2019, january 2019, jul 2019, july 2019, jun 2019, june 2019, last month, last week, last year, mar 2019, march 2019, may 2019, new york, nov 2019, november 2019, oct 2019, october 2019, pic, pm et, president donald, president donald trump, president trump, president trump’s, said statement, send whatsapp, sep 2019, september 2019, sign up, trump administration, trump said, twitter, united states, washington post, white house, york times."

Recall that President Trump was a dominant figure (mentioned in the titles of news articles in Fig. 1, 52% of the time). Consequently, Fig 2. was generated by treating the above bolded terms (e.g., donald trump, president donald, president donald trump,etc.) associated with "Trump" as stop words. This was done in order to give other salient sumgrams a chance of appearing in the top five sumgrams instead of being crowded out my the highly popular terms associated with "Trump." However, Fig. 3 below was generated without treating terms associated with "Trump" as stop words.

Fig. 3: Summary of top news stories of 2019 according to sumgram.  List of five top sumgrams (n = 2) generated for each month in 2019 from the 2019 StoryGraph dataset WITHOUT treating terms associated with "Trump" as stopwords unlike Fig. 2. Consequently, across all months in 2019, "president donald trump," was the top sumgram. The red text highlights the base ngrams. Key: # - Rank, DF - Document Frequency, DFR - Document Frequency Rate
Below I highlight my observations from the summary (Fig. 2) of the news cycle in 2019 according to sumgram, grouped by months.

JANUARY to FEBRUARY - The border wall and the partial government shutdown
The top sumgram the partial government shutdown of January highlights the budget fight between President Trump and the House Democrats over funding for the President's border wall (Fig. 2, January, Rank 2) which led to the partial government shutdown that began on December 22, 2018. The border wall sumgram in February signals the lingering of the partial government shutdown  story into February even though the 35-day shutdown — the longest in US history — ended on January 25, 2019.

MARCH - The Mueller Report and AG Barrs principal conclusions
Stories surrounding the release of the special counsel robert mueller's (Fig. 2, March Rank 1) report and attorney general william barr's (Fig. 2, March, Rank 2) release of the principal conclusions of the report dominated the news cycle in March 2019.

APRIL to MAY - The Mueller Report and Biden announces his candidacy for President
Mueller Report stories which began dominating the news cycle in March 2019 continued dominating the news cycle into April, but they shared the spotlight with stories reporting the Biden Presidential Candidacy following his announcement on April 25, 2019. Joe Biden remained a constant fixture in the news cycle from April to December 2019. However, it is important to state that the context around the mention of Joe Biden before September was probably due to his status as a top tier candidate in the Democratic field. But from September 2019, the context of his mention changed because of his  involvement in President's Trump's call with the Ukrainian President which led to the Whistleblower's report that precipitated the impeachment inquiry.

JUNE to JULY - The Democratic candidates and Alexandria Ocasio-Cortez
The 2020 US Democratic candidates such as bernie sanders (Fig. 2, June Rank 4) were in the June - July news spotlight of 2019. They shared the spotlight with Congresswoman alexandria ocasio-cortez (Fig. 2, July Rank 3) who received considerable attention from the media in July for different stories such as the green new deal, and a secret Facebook group of current and former Border Patrol members that consisted of posts that demeaned the Congresswoman.

AUGUST - The El Paso mass shooting
The tragic El Paso mass shooting (Fig. 2, August, Rank 5) dominated the August 2019 news cycle.

SEPTEMBER to DECEMBER - Impeachment inquiry announcement, public testimonies, and the impeachment of President Donald J. Trump
September through December chronicled the various stages of the impeachment inquiry and eventually the impeachment of President Trump. On September 24, 2019, House Speaker Nancy Pelosi announced the start of an official impeachment inquiry (Fig. 2, September Rank 4). Next, the November 2019 news cycle was dominated by the public impeachment hearings (Fig. 2, November Rank 5), with the eventual passing of the articles [of] impeachment (Fig. 2, December Rank 1).

StoryGraph has been generating news similarity graphs at 10-minute intervals since August 2017. A single graph file (e.g., this impeachment inquiry graph generated on September 24, 2019) includes the URL of the news articles, plaintext, entities, publication dates, etc. This post only reports an investigation into identifying the news stories that received significant attention in 2019. But there is still the opportunity for further study and we welcome any such initiatives.

-- Alexander C. Nwala (@acnwala)