2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016)
Dodging the Memory Hole 2016, held at UCLA's Charles Young Research Library in Los Angeles California, was a two-day event to discuss and highlight potential solutions to the issue of preserving born-digital news. Organized by Edward McCain (digital curator of journalism at the Donald W. Reynolds Journalism Institute and University of Missouri Libraries) this event brought together technologists, archivists, librarians, journalists and fourteen graduate students who had won travel scholarships for attendance. Among the attendees were four members of the WS-DL group (l-r): Mat Kelly, John Berlin, Dr. Michael Nelson, and Shawn Jones.
The event was made possible by support from the Reynolds Journalism Institute, Journalism Digital News Archive (JDNA), UCLA Library, the Educopia Institute and the Institute of Museum and Library Services (IMLS).
Day 1 (October 13, 2016)
Day one started off at 9am with Edward McCain welcoming everyone to the event and then turning it over to Ginny Steel, UCLA University Librarian, for opening remarks.
In the opening remarks, Ginny reflected on her career as a lifelong librarian, the evolution of printed news to digital and in closing she summarized the role archiving has to play in the digital-born news era.@RJIJDNA @UCLA @vsteel Saving Online #News #Legal #Technical #Policy #dtmh2016 #digitalmemory #freeexpression #historicalrecord #infoaccess pic.twitter.com/D5WFalhrWR- Sharon E. Farb (@FarbThink) October 13, 2016
- Todd Grappone (@liber8er) October 13, 2016
After opening remarks, Edward McCain went over the goals and sponsors of the event before transitioning to the first speaker Hjalmar Gislason.
Hjalmar Gislason's talk was entitled "Digital Salvage Operations: What's worth saving?"
In the talk, Hjalmar touched on issues concerning the amount of data currently being generated, how to determine context about data and the importance of if and that data lost due to not knowing if it is important could mean losing someone's life work. Hjalmar ended his talk with two takeaway points: "There is more to news archiving than the web: there is mobile content" and "Television news is also content that is important to save".#DtMH2016 @hjalli: The fundamental question: Do you want to save everything or do you want to get rid of everything?- ChrisAldrich (@ChrisAldrich) October 13, 2016
@hjalli Keynote #dtmh2016 #Digital #Salvage What #News Worth Saving? Not enough to save #stories. #Context is Everything #authenticity #1984 pic.twitter.com/vH82BZeD4F- Sharon E. Farb (@FarbThink) October 13, 2016
After a short break, panel one which consisted of Chris Freeland, Matt Weber, Laura Wrubel, and moderator Ana Krahmer addressed the question of "Why Save Online News".
Next Up. Why #Save #Online #News? Challenge #Access #Online #News Post Event @chrisfreeland @liblaura @docmattweber #dtmh2016 #localnews pic.twitter.com/KTpg7z4NwB- Sharon E. Farb (@FarbThink) October 13, 2016
Matt Weber started off the discussion by talking about the interactions between web archives and news media. Stating that digital only media has no offline surrogate and how it is becoming increasingly difficult to do anything but look at it now as it exists. Following Mat Weber were Laura Wrubel and Chris Freeland who both talked about the large share Twitter has in online news. Laura Wrubel brought up that in 2011 journalists primarily used Twitter to direct people to articles rather than for conversation. Chris Freeland stated that Twitter the primary source of information during the Ferguson protests in St. Louis and that the local news outlets were far behind in reporting the organic story as it happened.
Following panel one was Tim Groeling (professor and former chair of the UCLA Department of Communication Studies) giving presentation one entitled "NewsScape: Preserving TV News".#dtmh2016 @docmattweber "I don't think you'll ever convince publishers at scale to donate their economic property to memory institutions."- Kate Zwaard (@kzwa) October 13, 2016
The NewsScape project is currently migrating analog recordings of TV news to digital for archival lead by Tim Groesling. The collection contains recording dating back to 1950's and is the largest collection of TV news and public affairs programs containing a mix of U-matic, Betamax, and VHS tapes.
Currently, the project is working its way through the collections tapes completing 36k hours of encoding this year. Tim Groeling pointed out that VHS despite being the newest tapes are the most threatened.
After lunch, the attendees were broken up into fifteen groups for the first of two breakout sessions. Each group was tasked with formulating three things that could be included in a national agenda for news preservation and to come up with a project to advance the practice of online news preservation.#DtMH2016 Tim Groeling: We use a layer of dead VCR's over our good VCR's to prevent RF interference and audio buzzing. :)- ChrisAldrich (@ChrisAldrich) October 13, 2016
Each group sent up one person who briefly went over what they had come up with. Despite the diverse background of the attendees at dtmh2016 the ideas that each group came up with had a lot in common:
- A list of tools/technologies for archiving (awesome memento)
- Identifying broken links in new articles
- Increase awareness of how much or how little is archived
- Work with news organization to increase their involvement in archiving
- More meetups, events, hackathons that bring together technologists
with journalists and librarians
Dr. Clifford Lynch #dtmh2016 speaking about problems of scholarly journals, a topic very important to Phittle pic.twitter.com/ZWE6SFEXir— Phittle (@ThePhittle) October 13, 2016
In his talk, Clifford Lynch spoke about problems that plague news preservation such as link rot and the need for multiple archives.
#DtMH2016 Clifford Lynch: The material on lots of links (as sources) disappears after a short period of time.— ChrisAldrich (@ChrisAldrich) October 13, 2016
Clifford Lynch of @cni_org at #dtmh2016: "We have this mythology that @internetarchive archives the web. ... It's not a total solution."— Ben Welsh (@palewire) October 13, 2016
He also spoke on the need to preserve other kinds of media like data dumps and that archival record keeping goes hand in hand with journalism.
Who preserves the data dumps? Who preserves the PDFs and reports? No one has really stepped up. -Clifford Lynch #dtmh2016— P. Kim Bui (@kimbui) October 13, 2016
After his talk was over Edward McCain gave final remarks for day one and transitioned us to reception for the scholarship winners. The scholarship winners purposed projects (to be completed by December 2016) that would aid in digital news preservation and of these students three were WS-DL members (Shawn Jones, Mat Kelly, John Berlin)."Responsible journalism implies a strong permanent record of that work." -- Clifford Lynch #dtmh2016— Kate Zwaard (@kzwa) October 13, 2016
#dtmh2016 Introducing Amazing #GradStudents #Scholars #UCLA #Saving #Online #News #Students=Future pic.twitter.com/WGKFNrHs3O- Sharon E. Farb (@FarbThink) October 13, 2016
2017-04-07 edit: WSDL Scholarship Recipients Project Reports Are Featured on RJI Online
Shawn Jones: Indicators that tweeting may improve the detection of news articles for web archives
Mat Kelly: Investigation and implementation of tools to allow web archivists to accomplish three goals to independently archive content, explained in detail in the report
John Berlin: Twitter Feed Monitoring and Automatic Archival Through WAIL
Shawn Jones: Indicators that tweeting may improve the detection of news articles for web archives
Mat Kelly: Investigation and implementation of tools to allow web archivists to accomplish three goals to independently archive content, explained in detail in the report
John Berlin: Twitter Feed Monitoring and Automatic Archival Through WAIL
Day 2 (October 14, 2016)
Day two of dodging the memory hole 2016 began with Sharon Farb welcoming us back.
Followed by the first presentation of the day by our very own Dr. Nelson titled "Summarizing archival collections using storytelling techniques"
Meredith Broussard spoke about how archiving of news apps has become difficult as their content does not live in a single place.
Evan Sandhause presented the New York Times own take on the wayback machine called TimesMachine. The TimesMachine allows users to view the microfilm archive of The New York Times.
Mark Grahm of the Internet Archive was up first with a presentation on the wayback machine and how later this year it would be getting site search.
Kate Zwaard talked about the success of web archival events such as the recent Collections as Data and Archives Unleashed 2.0 held at the Library of Congress.
Jefferson Bailey's session, Web Archiving For News, was an informal breakout where he asked the attendants about collaboration between the Archive and other organizations. A notable response was from the NYTimes representative Evan Sandhaus with a counter question about whether organizations or archives should be responsible for the preservation of news content. Jefferson Bailey responded that he wished organizations were more active in practicing self-archiving. Others responded with their organizations or ones they knew about approaches to self-archiving.
Ben Welsh's session, News Apps, discussed issues archiving news apps which are online web applications providing rich data experiences. An example app to illustrate this was California's War Dead which was archived by the Internet Archive but with diminished functionality. In spite of this "success", Ben Welsh brought up the difficulty in preserving the full experience of the app as web crawlers only interact with client side code, not server side which is required. To address this issue, he suggested solutions such as the python library django-backery for producing flat, static versions of news apps based on database queries. These static versions can be more easily archived while still providing a fuller experience when replayed.
Kalev Leetaru's session, The GDELT Project: A Look Inside The World's Largest Initiative To Understand And Archive The World's News, was a more in depth version of the lightning talk he gave. Kalev Leetaru shared experiences that The GDELT Project had with archival crawling of non-English language news sites, his work with the Internet Archive on monitoring news feeds and broadcasts, the untapped opportunities for exploration of Internet Archive and A Vision Of The Role and Future Of Web Archives. He also shared two questions he is currently pondering: "Why are archives checking certain news organizations more than others?" and "How do we preserve GeoIP generated content especially in non-western news sites?".
-- John Berlin
@FarbThink greeting #dtmh2016 participants on start of day 2 @UCLA_library #savenews pic.twitter.com/2u353dLi9R— Edward McCain (@e_mccain) October 14, 2016
@FarbThink talks about human rights and the role journalism and journalists play. It's critical that we preserve that work #dtmh2016— Todd Grappone (@liber8er) October 14, 2016
Followed by the first presentation of the day by our very own Dr. Nelson titled "Summarizing archival collections using storytelling techniques"
The presentation highlighted the work done by Yasmin AlNoamany in her doctoral dissertation, in particular, The Dark and Stormy Archives (DSA) Framework.#dtmh2016 @phonedude_mln presents work with @yasmina_anwar and @weiglemc on "summarizing archival collections using storytelling techniques" pic.twitter.com/pW38yRrYs0— Shawn M. Jones (@shawnmjones) October 14, 2016
Up next was Pulitzer prize winning journalist Peter Arnett who presented "Writing The First Draft of History - and Saving It!" talking about his experiences while covering the Vietnam War and how he saved the Associated Presses Saigon office archives.#dtmh2016 @phonedude_mln details the Dark and Stormy Archives (DSA) framework for storytelling with archives pic.twitter.com/cpF6pBB2kQ— Shawn M. Jones (@shawnmjones) October 14, 2016
Following Perter Arnett was the second to last panel of dtmh2016 Kiss your app goodbye: the fragility of data journalism featuring Ben Welsh, Regina Roberts, Meredith Broussard and moderated by Martin Klein.Peter Arnett talks about being a journalist covering the Vietnam War and censorship #dtmh2016 pic.twitter.com/GgKL3NOMs4— Todd Grappone (@liber8er) October 14, 2016
Meredith Broussard spoke about how archiving of news apps has become difficult as their content does not live in a single place.
#DtMH2016 @merbroussard: News apps don't live in any of the CMSs. They're bespoke and live on a separate data server.— ChrisAldrich (@ChrisAldrich) October 14, 2016
Ben Welsh was up next speaking about the work he has done at the LA Times Data Desk.This is even more complicated with news apps, which are dynamic and separate from the web CMS @merbroussard #dtmh2016— Kate Zwaard (@kzwa) October 14, 2016
— Edward McCain (@e_mccain) October 14, 2016In his talk, he stressed the need for more tools to be made that allowed people like himself to make archiving and viewing of archived news content easier.
— Kate Zwaard (@kzwa) October 14, 2016Following Ben Welsh was Regina Roberts who spoke about the work done at Standford for archiving and adding context to the data sets that live beside the codebases of research projects.
#dtmh2016 Regina Lee Roberts on preservation and sharing of big data at Stanford pic.twitter.com/1aVaMiX2mR— Shawn M. Jones (@shawnmjones) October 14, 2016
The last panel of dtmh2016 "The future of the past: modernizing The New York Times archive" featured members of the technology team at the New York Times Evan Sandhaus, Jane Cotler, and Sophia Van Valkenburg with moderator Edward McCain.#dtmh2016 Regina Lee Roberts talks about creating BLDR (big local data repository) at Stanford pic.twitter.com/X9VQO8yRhC— Shawn M. Jones (@shawnmjones) October 14, 2016
Evan Sandhause presented the New York Times own take on the wayback machine called TimesMachine. The TimesMachine allows users to view the microfilm archive of The New York Times.
Sophia Van Valkenburg spoke about how the New York Times was transitioning its news archives into a more modern system.#dtmh2016 @kansandhaus introduces the @nytimes TimesMachine of scans and metadata from microfilm https://t.co/ExRZGkdqfm pic.twitter.com/x3bpvZ0Yut— Shawn M. Jones (@shawnmjones) October 14, 2016
After Sophia Valkenburg, was Jan Cotler who spoke about the gotchas encountered during the migration process. Most notable of the gotchas was that the way in which the articles were viewed (i.e, visual aesthetics) was not preserved in the migration process in favor of a "better user experience" and that in migrating to the new system links to the old pages would no longer work.#dtmh2016 Sophia van Valkenburg demonstrates flowchart for converting legacy born digital articles @nytimes into format used by current CMS pic.twitter.com/XqlHAFGBwa— Shawn M. Jones (@shawnmjones) October 14, 2016
#dtmh2016 Jane Cotler mentioned decommissioning old URLs and how this can lead to link rot for those linking to @nytimes— Shawn M. Jones (@shawnmjones) October 14, 2016
#DtMH2016 @janecotler: We made the decision of taking out data we had in lieu of making a better user experience for missing sections.— ChrisAldrich (@ChrisAldrich) October 14, 2016
Lightning rounds were up next.#dtmh2016 @kansandhaus "much easier to preserve print journalism because it is not a nexus of content and software"— Shawn M. Jones (@shawnmjones) October 14, 2016
Mark Grahm of the Internet Archive was up first with a presentation on the wayback machine and how later this year it would be getting site search.
— John Berlin (@johnaberlin) October 14, 2016
Jefferson Bailey also of the Internet Archive spoke on the continual efforts at the Internet Archive to get the web archives into the hands of researchers.#dtmh2016 @MarkGraham from @internetarchive discussed "save page now" @internetarchive, upcoming site search, and more— Shawn M. Jones (@shawnmjones) October 14, 2016
#dtmh2016 @jefferson_bail on "trying to get web archives into the hands of researchers" pic.twitter.com/TKu9s6MU8b— Shawn M. Jones (@shawnmjones) October 14, 2016
Terry Britt spoke about how social media over time establishes "collective memory".#dtmh2016 @jefferson_bail is talking about derivative data sets for researchers, extracting metadata from collections into WAT, LGA, WANE— Shawn M. Jones (@shawnmjones) October 14, 2016
Katherine Boss presented "Challenges facing the preservation of born-digital news applications" and how they end up in dependency hell.On episodic and mediated memory. Journalists are responsible for mediated memory. #dtmh2016 pic.twitter.com/hLQH1EFBAg— P. Kim Bui (@kimbui) October 14, 2016
Eva Revear presented a tool to discover frameworks and software used for news appsLightning talk from Katherine Boss and Meredith Broussard#dtmh2016 pic.twitter.com/EbKxqrfE8j— John Berlin (@johnaberlin) October 14, 2016
Cynthia Joyce talked about a book on Hurricane Katrina and its use of archived news coverage of the storm.#dtmh2016 @erevear presents a survey tool to discover frameworks and software used for news apps pic.twitter.com/LdIdtpnO8q— Shawn M. Jones (@shawnmjones) October 14, 2016
Jennifer Younger presented the work being done by the Catholic News Archive.#dtmh2016 @cynthiajoyce talks about Hurricane Katrina and a book of the curated experiences of those who covered the storm pic.twitter.com/hw0u0yuMAh— Shawn M. Jones (@shawnmjones) October 14, 2016
— Shawn M. Jones (@shawnmjones) October 14, 2016Kalev Leetaru talked about the work he and the gdeltproject are doing in web archival.
— John Berlin (@johnaberlin) October 14, 2016
The last presentation of the event was by Kate Zwaard titled "Technology and community Why we need partners, collaborators, and friends".An overview of the loss of journalistic content. #dtmh2016 pic.twitter.com/HKc6OU8Muy— P. Kim Bui (@kimbui) October 14, 2016
Kate Zwaard talked about the success of web archival events such as the recent Collections as Data and Archives Unleashed 2.0 held at the Library of Congress.
The web archive collection at the Library of Congress.#dtmh2016 @kzwa mentioned the success of #archivesunleashed @librarycongress earlier this year pic.twitter.com/YkjY0IoHEJ— Shawn M. Jones (@shawnmjones) October 14, 2016
How they are putting Jupyter notebooks on top of database dumps.#dtmh2016 @kzwa talking about web archive @librarycongress https://t.co/oNRWXpLv7Q, which is #memento compliant pic.twitter.com/yE09UWNY2B— Shawn M. Jones (@shawnmjones) October 14, 2016
And the diverse skill sets required for librarians of today.#dtmh2016 @kzwa talks about saving #Jupyter notebooks and database dumps https://t.co/k4cZJues1Q pic.twitter.com/LQa2DPE0C4— Shawn M. Jones (@shawnmjones) October 14, 2016
The final breakout sessions of dtmh2016 consisted of four topic discussions."It's like physicists in the '50s." @kzwa of @librarycongress talks about wide range of skill sets necessary for librarians #dtmh2016 pic.twitter.com/chM22UCuAJ— JDNA (@RJIJDNA) October 14, 2016
Jefferson Bailey's session, Web Archiving For News, was an informal breakout where he asked the attendants about collaboration between the Archive and other organizations. A notable response was from the NYTimes representative Evan Sandhaus with a counter question about whether organizations or archives should be responsible for the preservation of news content. Jefferson Bailey responded that he wished organizations were more active in practicing self-archiving. Others responded with their organizations or ones they knew about approaches to self-archiving.
Ben Welsh's session, News Apps, discussed issues archiving news apps which are online web applications providing rich data experiences. An example app to illustrate this was California's War Dead which was archived by the Internet Archive but with diminished functionality. In spite of this "success", Ben Welsh brought up the difficulty in preserving the full experience of the app as web crawlers only interact with client side code, not server side which is required. To address this issue, he suggested solutions such as the python library django-backery for producing flat, static versions of news apps based on database queries. These static versions can be more easily archived while still providing a fuller experience when replayed.
Eric Weig's session, Working with CMS, started out with him sharing his experience of migrating one the Univeristy of Kentucky Libraries Special Collections Research Center newspaper sites cms from a local data center using sixteen cpus to a less powerful cloud-based solution using only two cpus. One of the biggest performance increases came when he switched from dynamically generating pages to serving static html pages. Generating the static html pages for the eighty-two thousand issues contained in this cms took only three hours on the two cpu cloud-based solution. After sharing this experience the rest of the time was used to hear from the audience about their experiences using cms and an impromptu roundtable discussion on cms.Ben Welsh @palewire focusing on how his news app for @latimes is made at #dtmh2016 at @UCLA_library ssavenews pic.twitter.com/5Rk71tc5B9— Edward McCain (@e_mccain) October 14, 2016
Kalev Leetaru's session, The GDELT Project: A Look Inside The World's Largest Initiative To Understand And Archive The World's News, was a more in depth version of the lightning talk he gave. Kalev Leetaru shared experiences that The GDELT Project had with archival crawling of non-English language news sites, his work with the Internet Archive on monitoring news feeds and broadcasts, the untapped opportunities for exploration of Internet Archive and A Vision Of The Role and Future Of Web Archives. He also shared two questions he is currently pondering: "Why are archives checking certain news organizations more than others?" and "How do we preserve GeoIP generated content especially in non-western news sites?".
— John Berlin (@johnaberlin) October 14, 2016The last speaker of dtmh2016 was Katherine Skinner with Alignment and Reciprocity. In her speech Katherine Skinner called for volunteers to carry out some of the actions mentioned at dtmh2016 and reflected on the past two days.
Closing out dtmh2016 was Edward McCain who thanked everyone for coming and expressed how enjoyable this event was especially with the graduate students and Todd Grappone's closing remarks. In the closing remarks, Todd Grappone reminded attendees of the pressing problems in news archival and how they require both academic and software solutions.Katherine Skinner from @Educopia talks about Alignment and Reciprocity at #dtmh2016 @UCLA_library #savenews pic.twitter.com/iOM3UUKQHF— Edward McCain (@e_mccain) October 14, 2016
Video recordings of DTMH2016 can be found on the Reynolds Journalism Institute's Facebook page. Chris Aldrich recorded audio along with a transcription of days one and two. NPR's Research, Archive & Data Strategy team created a storify page of tweets covering topics they found interesting.I'm sad about the end of #dtmh2016; it was good to meet everyone; lots of good experiences pic.twitter.com/bQX6DUXxrx— Shawn M. Jones (@shawnmjones) October 14, 2016
-- John Berlin
2017-04-07 edit:
JDNA has created an recap page for the conference.
Three talks have been transcribe along with a video of each
JDNA has created an recap page for the conference.
Three talks have been transcribe along with a video of each
2017-07-24 Update: The final reports for John, Shawn, Mat, and others are available on rjionline.org:
New! @johnaberlin's project seeks to extend WAIL w/ ability to monitor a user’s Twitter feed for automatic archival: https://t.co/9Hb6x6JwOL— JDNA (@RJIJDNA) April 4, 2017
Paper exploring social media sharing of news & how quickly those articles are found by web crawlers by @shawnmjones: https://t.co/lI9TOOv8b9— JDNA (@RJIJDNA) March 7, 2017
#DTMH2016 report by Mat Kelly (@machawk1) of @ODUnow on tools needed for personal web archivists. Free PDF: https://t.co/BggwTXeA5o pic.twitter.com/sp4E11OvIi— JDNA (@RJIJDNA) April 11, 2017
This is a great summary of the two days, John. Thanks for joining us (ODU rocks) and I look forward to seeing how we move born-digital news preservation to the next phase.
ReplyDeleteFantastic John! One other useful resource I put together was a Twitter list of everyone who tweeted about #DtMH2016 just before/during/after the conference. Hopefully it can help keep many of us in touch after-the-fact. #DtMH2016 accounts on Twitter.
ReplyDelete