International Internet Preservation Consortium (IIPC) General Assembly 2014 (#iipcGA14) hosted by the Bibliothèque nationale de France (BnF) in Paris. Although the GA ran the entire week (May 19 -- May 23), I was only able to attend May 20 & 21. It looks like I missed some good material on the first day, including keynotes from Wendy Hall and Wolfgang Nejdl, and a presentation from Common Crawl. Martin Klein also presented an overview of the Hiberlink project, as well as the "mset attribute" that we are working on with the people from Harvard.
I arrived after lunch on May 20, in time for a really strong session on "Harvesting and access: technical updates", featuring talks about Solr indexing (Andy Jackson et al.) (Andy's slides), deduplicating content in WARCs (Kristinn Sigurðsson), Heritrix updates (Kris Carpenter), and Open Wayback (Helen Hockx). Within WS-DL, we haven't really done much with Solr in our projects or classes and that's a shortcoming we should address soon.
The morning of May 20 began with presentations from Helen Hockx and Gildas Illien about creating IIPC-branded collections (essentially continuing the Olympics collections available so far), followed by breakout sessions to discuss the legal and technical issues regarding such collections (guess which one is the most problematic!). Although all considered this an interesting direction for IIPC to pursue, I'm not sure we made much progress on how to proceed.
After lunch, I gave my presentation in a session that included status updates about the KB's web archives (Anna Rademakers (slides)) and the Internet Memory Foundation (Leïla Medjkoune and Florent Carpentier (slides)). My talk established the metaphor of web archives as "cluttered attics, garages, and basements" and then about profiling web archives to better perform query routing at the Memento Aggregator, as well as provide an interchange format and mechanism to coordinate IIPC crawling and coverage activities, including the contents of dark archives.
The day ended with a session about archiving Dutch public TV (Lotte Belice Baltussen (slides)) and crawling & archiving RSS feeds (Kristinn Sigurðsson (slides)). Thursday and Friday closed out with public workshops, but I was already well into my homeward bound ordeal during those days.
As always, the IIPC GA was filled with informative sessions and a collaborative spirit. It was great catching up with old friends, and especially good to see WS-DL alumni Martin Klein (LANL) and Ahmed AlSum (Stanford). Unfortunately, it is probably one of the last events at which we'll see Kris Carpenter since she is transitioning out of the Internet Archive. I regret that my schedule did not allow me to attend the entire GA. Although it is not quite official yet, it looks like the 2015 GA will be held at/near Stanford.
N.B. I will update the narrative above with links to the slides as they become available.
2014-05-27 Update: A mostly complete set of presentations is now available.
2014-06-18 Update: Blog posts about the IIPC GA from Ahmed AlSum and Nicholas Taylor.
2014-07-28 Update: The BnF has posted some of the videos from the GA.