Friday, May 20, 2011

2011-05-20: Report on the 2011 IIPC General Assembly

I spent the week of May 9--13 at the KB in The Hague, the Netherlands for the 2011 IIPC General Assembly. Joining me there was Rob Sanderson of LANL. Rob had attended the 2010 GA in Singapore, but this was my first IIPC and I learned a great deal.

The first day was open to the public in a special session entitled "Out of the Box: Building and Using Web Archive Collections", of which I missed most because I was taking a nap after arriving the morning of May 9. Fortunately, Inge Angevaare prepared a comprehensive summary of the first day. I believe presentations and a video of highlights from the first day will be available from the IIPC site shortly.

The next three days were spent in the IIPC plenary and working groups. Rob gave a high-level Memento status report on Tuesday, and Rob and I gave a more detailed tutorial later in the day:

Wednesday and Thursday were largely spent meeting with the Access Working Group discussing a pilot project that would, using Memento, allow harvesting and re-exposing of web page metadata from various IIPC member national libraries to the public. The goal is to have a large-scale, working demo of using Memento to aggregate the metadata about IIPC members' archives for the 2012 GA (to be held in Washington DC).

One of the things I learned at the IIPC is that many national libraries are archiving their national top-level domains (e.g., BNF archiving *.fr web sites), but rather restrictive intellectual property laws prevent the libraries from opening their archives off-site (in other words, you have to travel to the BNF to view their *.fr archives). I suppose I had been spoiled by the relatively unencumbered approach afforded to the Internet Archive. Of course, we'd like to see these archives completely opened in the future, but the ability to advertise their contents in a machine-readable manner is a good first step.

Friday was an excellent hands-on tutorial led by Brad, Aaron, and Vinay from the Internet Archive (bios) about processing WARC, CDX, and WAT files using Hadoop and Pig. Vinay provided a page that gathers all the appropriate links into one place (the data files were distributed via thumb drive).

Rob left on Saturday morning, passing Herbert on the train as he arrived Saturday for his 2 month visit to DANS. Herbert and I spent the day catching up while touring The Hague and Delft.

Inge also blogged about the closing of the conference ("Memento Sparks Optimism at Closing of IIPC 2011"), and the Twitter hashtag was "#iipc". I'll update this entry when additional information from IIPC is posted.