2020-08-17: Web Archiving and Digital Libraries (WADL) Workshop 2020 Trip Report




This year, the Web Archiving and Digital Libraries 2020 Workshop (#WADL2020) was held on August 5, 2020. Due to the COVID-19 pandemic, the workshop was held virtually. As it had been in previous years, it was organized by Dr. Zhiwu Xie, Dr. Martin Klein, and Dr. Edward A. Fox. Contributions from the Web Science and Digital Libraries Research Group (WSDL) at Old Dominion University included multiple presentations from different members of the group.

Tian Xia from the Renmin University of China gave a keynote talk titled “The practice and inspiration of Web Archiving in China”. His talk covered different web archiving initiatives in China, their practices for web archiving, and challenges they faced. The challenges he highlighted included difficulties between providers and archiving agents, short term utilization vs. long term preservation, and a lack of best practices. He also discussed their efforts on institutional Web archiving. The goal was to make it easy to deploy as well as easy to transfer.

We, Abigail Mabe and Dhruv Patel, presented our talk “TMVis: Visualizing Webpage Changes Over Time”. The presentation highlighted TimeMap Visualization, an archival thumbnail visualization server. We started by discussing how web archives hold thousands of snapshots of webpages at different points in time and how it is impossible to view all of these snapshots. Also, we discussed the importance of visualizing webpages and how it gives us an understanding of events at the time of the webpage change. We then walked through how to use the service and showed all of the different visualizations.

TMVis: Visualizing Webpage Changes Over Time

Next, Kai Naumann from Landesarchiv Baden-Württemberg presented his talk “125 Databases for the Year 2080”. He started by talking about Landesarchiv Baden-Württemberg and how they are a key research infrastructure and that they save many kinds of records. They also preserve the records and make them accessible. He posed the challenge: How do you preserve 125 databases from the year 2080 and onwards? The databases must be prepared so that they can be used a variety of ways in 2080. Described in the rest of the presentation were possible storage solutions; CSV, XML, Disk image, and Docker image. Another proposed solution was the use of web crawlers. He then went on to compare the cost, as well as the pros and cons, of each of these solutions.

Shawn M. Jones presented two different talks at WADL. In the first, he introduced us to a unique service that uses multiple tools he has developed over the past few years. This service is called SHARI, which is short for StoryGraph Hypercane ArchiveNow Raintale Integration. SHARI gathers different stories throughout the day and evaluates the biggest story of the day. The biggest story of the day is posted on GitHub pages at DSA Puddles. Follow @StormyArchives to learn more about different storytelling tools and services.

SHARI - An Integration of Tools to Visualize the Day

Shawn M. Jones then presented his second talk, titled “MementoEmbed and Raintale for Web Archive Storytelling”. MementoEmbed is a tool that generates surrogates for mementos. Raintale is a tool used to generate social media stories from groups of mementos. This tool offers story generation in various formats such as HTML, Markdown, MediaWiki, and more. Surrogates generated by MementoEmbed are used by Raintale as visual representations of each memento.

MementoEmbed and Raintale for Web Archive Storytelling

Next was the Invited Panel “Making, Using, and Exploring Web Archives: Tales from Scholars & Practitioners”. Vicky Steeves from New York University moderated this session. 

On the panel was Alexander Nwala from Old Dominion University, Genevieve Milliken from New York University, Emily Maemura from the University of Toronto, Karen Hanson from Portico, and Meghan Lyon from the Library of Congress. Alexander talked about the tools he created during his PhD work, such as Local Memory Project, I Can Haz Memento, StoryGraph, sumgram, and more. Next, Meghan Lyon talked about her work in the New York Art Resources Consortium which consisted of exploring maintenance of the consortium’s Archive-It collections and developing a web archive for a practicing artist. Genevieve Milliken discussed her work in a project titled “Engaging the Web Archive”. The project examines the ways web archives can be a useful tool for researchers and the general public. The next member on the panel was Emily Maemura. Emily talked about her research in scholarly uses of web archived data which consisted of examining the creation and documentation of three different web archive collections. Her talk focused on research described in the article titled "If these crawls could talk: Studying and documenting web archives provenance". Next, Karen Hanson shared her web archiving work in e-books which focused on identifying preservable aspects at scale using current tools and producing guidelines that authors and publishers can follow to make their work more preservable. You can read more in her blog post titled "Enhancing Services to Preserve New Forms of Scholarship".

Ben O’Brien from the National Library of New Zealand presented “Improving the Quality of Web Harvests Using Web Curator Tool”. The Web Curator Tool (WCT) uses a variety of tools for creating, analyzing, and replaying web archives. Version 2 of the tool was released at the end of 2018, and version 3 is currently being tested and is planned to be released at the end of August. For future versions, they plan to integrate the WCT with more crawlers and preservation systems. You can download it or visit the GitHub Repo from the WCT main page.

The final event on the agenda for #WADL2020 was a session of open discussion to wrap up the conference. All attendees were able to participate and give comments on the conference or recommendations for future workshops. It was a great experience to be able to participate in this workshop as both an attendee and presenter. Although we could not meet in person this year, we were all able to participate virtually. 

—Abigail (@abigail_mabe) and Dhruv (@dhruv_282)

Comments