2020-08-17: Web Archiving and Digital Libraries (WADL) Workshop 2020 Trip Report

This year, the Web Archiving and Digital Libraries 2020 Workshop (#WADL2020) was held on August 5, 2020. Due to the COVID-19 pandemic, the workshop was held virtually. As it had been in previous years, it was organized by Dr. Zhiwu Xie, Dr. Martin Klein, and Dr. Edward A. Fox. Contributions from the Web Science and Digital Libraries Research Group (WSDL) at Old Dominion University included multiple presentations from different members of the group.

@edwardafox and Zhiwu Xie from @virginia_tech starting off the Workshop 6: "Web Archiving and Digital libraries" #WADL2020 #JCDL2020. The workshop started with participants introducing themselves. @WebSciDL pic.twitter.com/UtJnffpBuL
— Yasith Jayawardana (@yasithmilinda) August 5, 2020

Tian Xia from the Renmin University of China gave a keynote talk titled “The practice and inspiration of Web Archiving in China”. His talk covered different web archiving initiatives in China, their practices for web archiving, and challenges they faced. The challenges he highlighted included difficulties between providers and archiving agents, short term utilization vs. long term preservation, and a lack of best practices. He also discussed their efforts on institutional Web archiving. The goal was to make it easy to deploy as well as easy to transfer.

Tian Xia from School of Information Resource Management, Renmin University of China is giving the invited talk “The practice and inspiration of Web Archiving in China” at #WADL2020 #JCDL2020 pic.twitter.com/7d72FNadao
— Shawn M. Jones (@shawnmjones) August 5, 2020

Tian Xia is now covering cooperation in #webarchiving and the efforts with making use of #webarchives, like search, more accurate capture, #visualization, topic analysis, and more.#WebArchiveWednesday #WADL2020 #JCDL2020 pic.twitter.com/dcKF42CEIn
— Shawn M. Jones (@shawnmjones) August 5, 2020

We, Abigail Mabe and Dhruv Patel, presented our talk “TMVis: Visualizing Webpage Changes Over Time”. The presentation highlighted TimeMap Visualization, an archival thumbnail visualization server. We started by discussing how web archives hold thousands of snapshots of webpages at different points in time and how it is impossible to view all of these snapshots. Also, we discussed the importance of visualizing webpages and how it gives us an understanding of events at the time of the webpage change. We then walked through how to use the service and showed all of the different visualizations.

TMVis: Visualizing Webpage Changes Over Time

.@abigail_mabe is now demonstrating the interface of TMVis. TMVis supports mementos from @internetarchive @archiveitorg @ArquivoWeb_PT #WADL2020 #JCDL2020 #WebArchiveWednesday pic.twitter.com/kjLE02WuqC
— Shawn M. Jones (@shawnmjones) August 5, 2020

#WADL2020 #JCDL2020 #WebArchiveWednesday
More information on TMVis.

Demo: https://t.co/1E1ccn7ynT
Video: https://t.co/qIsEnXkPEz
Blog: https://t.co/hRij22YDAS
Tech report: https://t.co/lviEVQ3cuS
GitHub: https://t.co/PctXuB0HXB
Supported by: https://t.co/qsDyMdBwHs
— Shawn M. Jones (@shawnmjones) August 5, 2020

Next, Kai Naumann from Landesarchiv Baden-Württemberg presented his talk “125 Databases for the Year 2080”. He started by talking about Landesarchiv Baden-Württemberg and how they are a key research infrastructure and that they save many kinds of records. They also preserve the records and make them accessible. He posed the challenge: How do you preserve 125 databases from the year 2080 and onwards? The databases must be prepared so that they can be used a variety of ways in 2080. Described in the rest of the presentation were possible storage solutions; CSV, XML, Disk image, and Docker image. Another proposed solution was the use of web crawlers. He then went on to compare the cost, as well as the pros and cons, of each of these solutions.

.@Naumann_Kai from Landesarchiv Baden-Württemberg is now presenting "125 Databases for the Year 2080" #WADL2020 #JCDL2020 pic.twitter.com/vsARG37Nz7
— Shawn M. Jones (@shawnmjones) August 5, 2020

Potential solution:
* CSV+ - least difficult, but complicated in the beginning
* XML
* Disk image - use emulation of client/server hardware and OS
* Docker image
All need preservation of handbooks/tutorials to reuse the UI and commands.#WADL2020 #JCDL2020 #WebArchiveWednesday pic.twitter.com/tzDNXJWU8J
— Shawn M. Jones (@shawnmjones) August 5, 2020

Shawn M. Jones presented two different talks at WADL. In the first, he introduced us to a unique service that uses multiple tools he has developed over the past few years. This service is called SHARI, which is short for StoryGraph Hypercane ArchiveNow Raintale Integration. SHARI gathers different stories throughout the day and evaluates the biggest story of the day. The biggest story of the day is posted on GitHub pages at DSA Puddles. Follow @StormyArchives to learn more about different storytelling tools and services.

@shawnmjones presenting "SHARI – An Integration of Tools to Visualize the Story of the Day" at #WADL2020 #JCDL2020 @WebSciDL @oducs

S -StoryGraph
H -Hypercane
A - ArchiveNow
R - Raintale
I - Integration pic.twitter.com/te19c4ksyK
— Yasith Jayawardana (@yasithmilinda) August 5, 2020

@WebSciDL @shawnmjones presents SHARI at #WADL2020
Workshop: https://t.co/1P0Sp3o17T
SHARI detects and visualizes @storygraphbot biggest news story of the day

Report: https://t.co/jOsMkNoS44
Slides: https://t.co/DZ0aYqB339

#JCDL2020 pic.twitter.com/8VgiC05Do1
— Alexander C. Nwala (@acnwala) August 5, 2020

SHARI - An Integration of Tools to Visualize the Day

Shawn M. Jones then presented his second talk, titled “MementoEmbed and Raintale for Web Archive Storytelling”. MementoEmbed is a tool that generates surrogates for mementos. Raintale is a tool used to generate social media stories from groups of mementos. This tool offers story generation in various formats such as HTML, Markdown, MediaWiki, and more. Surrogates generated by MementoEmbed are used by Raintale as visual representations of each memento.

@shawnmjones presenting "MementoEmbed and Raintale for Web Archive Storytelling" at #WADL2020 #JCDL2020 @WebSciDL @oducs

"surrogates provide a visual summary of the content behind a URI" pic.twitter.com/UfeP8UgLng
— Yasith Jayawardana (@yasithmilinda) August 5, 2020

Shawn Jones (@shawnmjones) is giving a great, detailed presentation at #wadl2020 on MementoEmbed and Railtale for storytelling and representation with web archives.#webarchiving

• https://t.co/fE9KWHvlB7
• https://t.co/x5Q2yZZJKm pic.twitter.com/4OPGMYXgmX
— Mat Kelly (@machawk1) August 5, 2020

MementoEmbed and Raintale for Web Archive Storytelling

Next was the Invited Panel “Making, Using, and Exploring Web Archives: Tales from Scholars & Practitioners”. Vicky Steeves from New York University moderated this session.

Psyched to be moderating an invited panel for WADL 2020 (https://t.co/63LzH9Cokb) TODAY!!

"Making, Using, and Exploring Web Archives: Tales from Scholars & Practitioners" with @gen_milliken @acnwala @karenhansn @emilymaemura & Meghan Lyon (LOC)

3-4pm Eastern time! DM for link~
— Vicky Steeves (joinmastodon.org) (@VickySteeves) August 5, 2020

On the panel was Alexander Nwala from Old Dominion University, Genevieve Milliken from New York University, Emily Maemura from the University of Toronto, Karen Hanson from Portico, and Meghan Lyon from the Library of Congress. Alexander talked about the tools he created during his PhD work, such as Local Memory Project, I Can Haz Memento, StoryGraph, sumgram, and more. Next, Meghan Lyon talked about her work in the New York Art Resources Consortium which consisted of exploring maintenance of the consortium’s Archive-It collections and developing a web archive for a practicing artist. Genevieve Milliken discussed her work in a project titled “Engaging the Web Archive”. The project examines the ways web archives can be a useful tool for researchers and the general public. The next member on the panel was Emily Maemura. Emily talked about her research in scholarly uses of web archived data which consisted of examining the creation and documentation of three different web archive collections. Her talk focused on research described in the article titled "If these crawls could talk: Studying and documenting web archives provenance". Next, Karen Hanson shared her web archiving work in e-books which focused on identifying preservable aspects at scale using current tools and producing guidelines that authors and publishers can follow to make their work more preservable. You can read more in her blog post titled "Enhancing Services to Preserve New Forms of Scholarship".

#JCDL2020 #WADL2020 #WebArchiveWednesday @acnwala presented many of the resources he created during his PhD work:
* https://t.co/IpaUqU6azb
* https://t.co/uCvpiSIOtJ
* https://t.co/j2YRfLsSa6
* https://t.co/1XJ1XeB6e6
* https://t.co/ndk9dWun6u
but this is an incomplete list... pic.twitter.com/yJkQ2LraV8
— Shawn M. Jones (@shawnmjones) August 5, 2020

.@gen_milliken summarized her work with #webarchives

Refs:
* https://t.co/rbrQbWW9Vq
* https://t.co/LUmnAOFL57
* https://t.co/oVI7EHjmJ7
* https://t.co/RRCFQk3EdZ #JCDL2020 #WADL2020 #WebArchiveWednesday pic.twitter.com/MtAeGdkltZ
— Shawn M. Jones (@shawnmjones) August 5, 2020

.@karenhansn recapped her #webarchiving work with e-books

Refs: https://t.co/GsL3PcqznM #JCDL2020 #WADL2020 #WebArchiveWednesday pic.twitter.com/8of6mXvpgq
— Shawn M. Jones (@shawnmjones) August 5, 2020

Ben O’Brien from the National Library of New Zealand presented “Improving the Quality of Web Harvests Using Web Curator Tool”. The Web Curator Tool (WCT) uses a variety of tools for creating, analyzing, and replaying web archives. Version 2 of the tool was released at the end of 2018, and version 3 is currently being tested and is planned to be released at the end of August. For future versions, they plan to integrate the WCT with more crawlers and preservation systems. You can download it or visit the GitHub Repo from the WCT main page.

.@ob1_ben_ob: WCT uses a variety of tools for creating, replaying, and analyzing your #webarchives #WADL2020 #JCDL2020 #WebArchiveWednesday

Links:
* https://t.co/pWymSauRre
* https://t.co/hLNDbgPslK
* https://t.co/HlQhbcHFTc
* https://t.co/PEeo3mggOH
* https://t.co/Y7kQlLiREo pic.twitter.com/YJ5vF5qQV8
— Shawn M. Jones (@shawnmjones) August 5, 2020

And #WADL2020 is done... Thanks to chairs @zxie @edwardafox @mart1nkle1n
Thanks to the program committee:@justinfbrunelle @artlibrariannyc @JoshuaFinnell @AndreaGoethals Lauren Ko @fmccown @phonedude_mln @risse691 @nullhandle @docmattweber @weiglemc @liblaura #JCDL2020 #WADL2021 pic.twitter.com/qBeafRPQHZ
— Shawn M. Jones (@shawnmjones) August 5, 2020

The final event on the agenda for #WADL2020 was a session of open discussion to wrap up the conference. All attendees were able to participate and give comments on the conference or recommendations for future workshops. It was a great experience to be able to participate in this workshop as both an attendee and presenter. Although we could not meet in person this year, we were all able to participate virtually.

—Abigail (@abigail_mabe) and Dhruv (@dhruv_282)

Search This Blog

Web Science and Digital Libraries Research Group

2020-08-17: Web Archiving and Digital Libraries (WADL) Workshop 2020 Trip Report

Comments

Post a Comment