2017-03-02: National Symposium on Web Archiving Interoperability Trip Report

The National Symposium on Web Archiving Interoperability was held February 21-22, 2017 at The Internet Archive in San Francisco, CA. The symposium was held as part of the IMLS-funded "WASAPI" project, which is researching "web archiving systems APIs". The participants are Internet Archive’s Archive-It, Stanford University Libraries (DLSS and LOCKSS), University of North Texas, and Rutgers University. There were nearly 50 attendees from a variety of international institutions.

Jefferson Bailey and Nicholas Taylor began the day with a review of the WASAPI project: "Building API-Based Web Archiving Systems and Services". They also lead a discussion about soliciting usage scenarios and feedback from potential users (see the results from their 2016 survey). You can track the WASAPI developments at their github repo, where they have the WASAPI Data Transfer API General Specification (for the transfer of WARC files, WAT files, etc.), reference implementations, and other items.

.@nullhandle presenting the use cases that motivate WASAPI initiative. #webarchiving pic.twitter.com/FixnLqEnnB
— Justin Littman (@justin_littman) February 21, 2017

After a break, we had a series of short presentations:

Debbie Kempe, NYARC, "NYARC’s Opensearch API Integration"
Greg Wiedeman, SUNY Albany, "Automating Web Archives Records in ASpace" (see also his blog post from 2016-10-18).
Stephen Abrams, CDL, "Cobweb"
Michael Nelson, ODU, "Web Archiving Activities of ODU’s Web Science and Digital Library Research Group" (slides also embedded below)

Web Archiving Activities of ODU’s Web Science and Digital Library Research Group (@WebSciDL) from Michael Nelson

I knew our group had been busy, but I could not help but be impressed with my recent, albeit extremely brief, catalog of our activities. I put the focus on tools and services we had created in support of our research, which lead to interesting questions from Tom Cramer and others about the role of tool production for PhD students. It's something Michele and I struggle with frequently: everyone enjoys when their tools are popular and useful to others, but student success should not be predicated on the popularity and uptake of the tools. Some tools are simply more applicable to a wider audience than others, which does not mean they are more or less suitable for the research purposes for which they were created.

The day closed with a social and a lot of informal meetings in the lobby of the Internet Archive.

.@brewster_kahle kicks off day two of the #webArchiving interoperability symposium - in the beautiful sunny @internetarchive great hall. pic.twitter.com/ndlKt9CekF
— Ian Milligan (@ianmilligan1) February 22, 2017

The second day began with a keynote from Brewster Kahle and a tour of the Internet Archive itself. It was my second tour of the IA, but it is always enjoyable. After a break we had three presentations:

Tom Cramer, Stanford University, (a talk about collaboration, but I can't find the slides)
Dallas Pillen, Bentley Historical Library, "ArchivesSpace-Archivematica-DSpace Workflow Integration"
Nicholas Taylor, Stanford University, (a talk about new developments in LOCKSS, but I can't find the slides)

We then had breakout sessions about collaboration goals, API expectations, and the impact of interoperability. The breakout session I attended was only moderately successful, producing two concurrent discussions that were informative but did not produce much in the way of tangible outputs. The other sessions were more productive and had materials to report back to the symposium at large.

#WebArchiving in action! pic.twitter.com/7dvsRl95Ut
— Archive-It (@archiveitorg) February 23, 2017

After lunch, the day resumed with some WASAPI transfer demos. My notes show only David Rosenthal (Stanford) giving a live LOCKSS demo of using the WASAPI API, but there may have been more. That lead to three more presentations:

Justin Littman, GWU, Social Feed Manager (I can't find the slides)
Ilya Kreymer and Mark Beasley, Rhizome, "Webrecorder Interoperability"
Ian Milligan (Waterloo) and Nick Ruest (York), "Warcbase: Using Scalable Web Analytics to Analyze Canadian Collections En Masse"

I've seen demos and presentations about Social Feed Manager several times and although our group has yet to use it, it looks like a great tool. Justin has also done a good job providing several pre-built collections (contact him for details). Ilya's presentation was tremendous, highlighting such interesting features as mixed archive integration (including localhost and otherwise "private" archives), import of collections from other archives (e.g., Archive-It) and augmenting missing resources from the live web (I noted it should check other archives for the desired datetime to avoid zombies), and "curated archives" which appear to be similar to twitter moments or storify stories, but for archived pages (see Yasmin AlNoamany's recent dissertation for similar research in this area). Ilya and his group are doing really great stuff with webrecorder.io. Ian's and Nick's presentation was excellent as always, and highlighted the work they're doing with Warcbase.

Then we had another round of breakouts, although I don't have good notes about their contents. I spent a lot of this time talking with Mark Graham and other folks. The final round of presentations included:

Matt Weber, Rutgers - Archives Unleashed
Martin Klein, "Web Archive Interoperability with Memento"

.@mart1nkle1n presenting #memento #webarchiving pic.twitter.com/GUREfN7mJ5
— Michael L. Nelson (@phonedude_mln) February 23, 2017

Matt's presentation was about the two Archives Unleashed hackathons (see the @WebSciDL trip reports for the first and second Archives Unleashed hackathons). The third hackathon immediately followed this symposium (on Thursday & Friday, February 23-24), and the fourth hackathon has been announced for this summer at the British Library.

Martin's presentation touched on familiar topics such as the Time Travel Memento service, the Memento for Chrome extension, and Robust Links (demoable in our December 2015 D-Lib paper "Reminiscing About 15 Years of Interoperability Efforts"). This was a good presentation to end the day with, since Memento is the first and de facto standard for web archive interoperability (see the quick intro or RFC 7089). Memento does not address bulk upload or download of WARCs, WATs, etc., but it does define linkage between mementos (i.e., archived pages), their live web counterparts ("original resources"), lists of available mementos for an original resource ("TimeMaps"), and resources that do content negotiation in the dimension of datetime in order to direct you to the best available memento ("TimeGates").

The hashtag was "#webarchiving", but that's a general hashtag so the tweets from that event will quickly become lost. Some are embedded above but I've put the bulk of the symposiums tweets in this twitter moment. There was also a slack channel.

Overall this was an important and welcome event. There was less focus than I expected on the WASAPI APIs themselves, but perhaps I'm in the minority in enjoying digging through APIs. The WASAPI effort still seems to be in data reception mode, actively soliciting requirements and use cases. I thought there would be more demos from the WASAPI team, but I understand these things are difficult to bootstrap. The symposium was useful in getting many of the main players in the web archiving community together for technical interchange, especially since I'll miss the postponed IIPC General Assembly this year. Thanks to Jefferson, Lori, and everyone at the Internet Archive that helped host us, and thanks to the IMLS for funding this critical activity.

--Michael

"No one knows what to do with WARCs, so we got WAIL" #wail @WebSciDL @phonedude_mln #webarchiving pic.twitter.com/3qi2AKrntI
— Martin Klein (@mart1nkle1n) February 22, 2017

Search This Blog

Web Science and Digital Libraries Research Group

2017-03-02: National Symposium on Web Archiving Interoperability Trip Report

Comments

Post a Comment