Saturday, October 22, 2016

2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016)


Dodging the Memory Hole 2016, held at UCLA's Charles Young Research Library in Los Angeles California, was a two-day event to discuss and highlight potential solutions to the issue of preserving born-digital news. Organized by Edward McCain (digital curator of journalism at the Donald W. Reynolds Journalism Institute and University of Missouri Libraries) this event brought together technologists, archivists, librarians, journalists and fourteen graduate students who had won travel scholarships for attendance.  Among the attendees were four members of the WS-DL group (l-r): Mat KellyJohn BerlinDr. Michael Nelson, and Shawn Jones.


Day 1 (October 13, 2016)

Day one started off at 9am with Edward McCain welcoming everyone to the event and then turning it over to Ginny Steel, UCLA University Librarian, for opening remarks.
In the opening remarks, Ginny reflected on her career as a lifelong librarian, the evolution of printed news to digital and in closing she summarized the role archiving has to play in the digital-born news era.
After opening remarks, Edward McCain went over the goals and sponsors of the event before transitioning to the first speaker Hjalmar Gislason.


In the talk, Hjalmar touched on issues concerning the amount of data currently being generated, how to determine context about data and the importance of if and that data lost due to not knowing if it is important could mean losing someone's life work. Hjalmar ended his talk with two takeaway points: "There is more to news archiving than the web: there is mobile content" and "Television news is also content that is important to save".

After a short break, panel one which consisted of Chris Freeland, Matt Weber, Laura Wrubel, and moderator Ana Krahmer addressed the question of "Why Save Online News".

Matt Weber started off the discussion by talking about the interactions between web archives and news media. Stating that digital only media has no offline surrogate and how it is becoming increasingly difficult to do anything but look at it now as it exists. Following Mat Weber were Laura Wrubel and Chris Freeland who both talked about the large share Twitter has in online news.  Laura Wrubel brought up that in 2011 journalists primarily used Twitter to direct people to articles rather than for conversation. Chris Freeland stated that Twitter the primary source of information during the Ferguson protests in St. Louis and that the local news outlets were far behind in reporting the organic story as it happened.
Following panel one was Tim Groeling (professor and former chair of the UCLA Department of Communication Studies) giving presentation one entitled "NewsScape: Preserving TV News".

The NewsScape project is currently migrating analog recordings of TV news to digital for archival lead by Tim Groesling.  The collection contains recording dating back to 1950's and is the largest collection of TV news and public affairs programs containing a mix of U-matic, Betamax, and VHS tapes.


Currently, the project is working its way through the collections tapes completing 36k hours of encoding this year. Tim Groeling pointed out that VHS despite being the newest tapes are the most threatened.
After lunch, the attendees were broken up into fifteen groups for the first of two breakout sessions. Each group was tasked with formulating three things that could be included in a national agenda for news preservation and to come up with a project to advance the practice of online news preservation.

Each group sent up one person who briefly went over what they had come up with. Despite the diverse background of the attendees at dtmh2016 the ideas that each group came up with had a lot in common:
  • A list of tools/technologies for archiving (awesome memento)
  • Identifying broken links in new articles 
  • Increase awareness of how much or how little is archived
  • Work with news organization to increase their involvement in archiving 
  • More meetups, events, hackathons that bring together technologists
    with journalists and librarians  
The final speaker of the day was Clifford Lynch giving a talk entitled "Born-digital news preservation in perspective".
In his talk, Clifford Lynch spoke about problems that plague news preservation such as link rot and the need for multiple archives.

He also spoke on the need to preserve other kinds of media like data dumps and that archival record keeping goes hand in hand with journalism.
After his talk was over Edward McCain gave final remarks for day one and transitioned us to reception for the scholarship winners. The scholarship winners purposed projects (to be completed by December 2016) that would aid in digital news preservation and of these students three were WS-DL members (Shawn JonesMat KellyJohn Berlin).

Day 2 (October 14, 2016)

Day two of dodging the memory hole 2016 began with Sharon Farb welcoming us back.

Followed by the first presentation of the day by our very own Dr. Nelson titled "Summarizing archival collections using storytelling techniques"


The presentation highlighted the work done by Yasmin AlNoamany in her doctoral dissertation, in particular, The Dark and Stormy Archives (DSA) Framework.
Up next was Pulitzer prize winning journalist Peter Arnett who presented "Writing The First Draft of History - and Saving It!" talking about his experiences while covering the Vietnam War and how he saved the Associated Presses Saigon office archives.
Following Perter Arnett was the second to last panel of dtmh2016 Kiss your app goodbye: the fragility of data journalism featuring Ben Welsh, Regina Roberts, Meredith Broussard and moderated by Martin Klein.


Meredith Broussard spoke about how archiving of news apps has become difficult as their content does not live in a single place.
Ben Welsh was up next speaking about the work he has done at the LA Times Data Desk.
In his talk, he stressed the need for more tools to be made that allowed people like himself to make archiving and viewing of archived news content easier.
Following Ben Welsh was Regina Roberts who spoke about the work done at Standford for archiving and adding context to the data sets that live beside the codebases of research projects.
The last panel of dtmh2016 "The future of the past: modernizing The New York Times archive" featured members of the technology team at the New York Times Evan Sandhaus, Jane Cotler, and Sophia Van Valkenburg with moderator Edward McCain.

Evan Sandhause presented the New York Times own take on the wayback machine called TimesMachine. The TimesMachine allows users to view the microfilm archive of The New York Times.
Sophia Van Valkenburg spoke about how the New York Times was transitioning its news archives into a more modern system.
After Sophia Valkenburg, was Jan Cotler who spoke about the gotchas encountered during the migration process. Most notable of the gotchas was that the way in which the articles were viewed (i.e, visual aesthetics) was not preserved in the migration process in favor of a "better user experience" and that in migrating to the new system links to the old pages would no longer work.
Lightning rounds were up next.

Mark Grahm of the Internet Archive was up first with a presentation on the wayback machine and how later this year it would be getting site search.
Jefferson Bailey also of the Internet Archive spoke on the continual efforts at the Internet Archive to get the web archives into the hands of researchers.
Terry Britt spoke about how social media over time establishes "collective memory".
Katherine Boss presented "Challenges facing the preservation of born-digital news applications" and how they end up in dependency hell.
Eva Revear presented a tool to discover frameworks and software used for news apps
Cynthia Joyce talked about a book on Hurricane Katrina and its use of archived news coverage of the storm.
Jennifer Younger presented the work being done by the Catholic News Archive.
Kalev Leetaru talked about the work he and the gdeltproject  are doing in web archival.
The last presentation of the event was by Kate Zwaard titled "Technology and community Why we need partners, collaborators, and friends".

Kate Zwaard talked about the success of web archival events such as the recent Collections as Data and Archives Unleashed 2.0 held at the Library of Congress.
The web archive collection at the Library of Congress.
How they are putting Jupyter notebooks on top of database dumps.
And the diverse skill sets required for librarians of today.
The final breakout sessions of dtmh2016 consisted of four topic discussions.

Jefferson Bailey's session, Web Archiving For News, was an informal breakout where he asked the attendants about collaboration between the Archive and other organizations. A notable response was from the NYTimes representative Evan Sandhaus with a counter question about whether organizations or archives should be responsible for the preservation of news content. Jefferson Bailey responded that he wished organizations were more active in practicing self-archiving. Others responded with their organizations or ones they knew about approaches to self-archiving.

Ben Welsh's session, News Apps, discussed issues archiving news apps which are online web applications providing rich data experiences. An example app to illustrate this was California's War Dead which was archived by the Internet Archive but with diminished functionality. In spite of this "success", Ben Welsh brought up the difficulty in preserving the full experience of the app as web crawlers only interact with client side code, not server side which is required. To address this issue, he suggested solutions such as the python library django-backery for producing flat, static versions of news apps based on database queries. These static versions can be more easily archived while still providing a fuller experience when replayed.
Eric Weig's session, Working with CMS, started out with him sharing his experience of migrating one the Univeristy of Kentucky Libraries Special Collections Research Center newspaper sites cms from a local data center using sixteen cpus to a less powerful cloud-based solution using only two cpus. One of the biggest performance increases came when he switched from dynamically generating pages to serving static html pages. Generating the static html pages for the eighty-two thousand issues contained in this cms took only three hours on the two cpu cloud-based solution. After sharing this experience the rest of the time was used to hear from the audience about their experiences using cms and an impromptu roundtable discussion on cms.

Kalev Leetaru's session, The GDELT Project: A Look Inside The World's Largest Initiative To Understand And Archive The World's News, was a more in depth version of the lightning talk he gave. Kalev Leetaru shared experiences that The GDELT Project had with archival crawling of non-English language news sites, his work with the Internet Archive on monitoring news feeds and broadcasts, the untapped opportunities for exploration of Internet Archive and A Vision Of The Role and Future Of Web Archives. He also shared two questions he is currently pondering: "Why are archives checking certain news organizations more than others?" and "How do we preserve GeoIP generated content especially in non-western news sites?".
The last speaker of dtmh2016 was Katherine Skinner with Alignment and Reciprocity. In her speech Katherine Skinner called for volunteers to carry out some of the actions mentioned at dtmh2016 and reflected on the past two days.
Closing out dtmh2016 was Edward McCain who thanked everyone for coming and expressed how enjoyable this event was especially with the graduate students and Todd Grappone's closing remarks. In the closing remarks, Todd Grappone reminded attendees of the pressing problems in news archival and how they require both academic and software solutions.
Video recordings of DTMH2016 can be found on the Reynolds Journalism Institute's Facebook pageChris Aldrich recorded audio along with a transcription of days one and two. NPR's Research, Archive & Data Strategy team created a storify page of tweets covering topics they found interesting.

-- John Berlin 

2 comments:

  1. This is a great summary of the two days, John. Thanks for joining us (ODU rocks) and I look forward to seeing how we move born-digital news preservation to the next phase.

    ReplyDelete
  2. Fantastic John! One other useful resource I put together was a Twitter list of everyone who tweeted about #DtMH2016 just before/during/after the conference. Hopefully it can help keep many of us in touch after-the-fact. #DtMH2016 accounts on Twitter.

    ReplyDelete