9°C & wind 0 m/s! Our host @Landsbokasafn & @kristsi seem to have booked an early spring for #iipcga16. Thank you! pic.twitter.com/jvyORkzsRo— IIPC (@NetPreserve) April 12, 2016
The 2016 IIPC General Assembly and the separate-but-related IIPC Web Archiving Conference 2016 were held in Reykjavík, Iceland, April 11-15, with the former being open to IIPC members only and the latter open to the public. Unfortunately, my trip report will be incomplete since I had to leave midday on Wednesday. The first day was primarily given to IIPC business: introducing the new officers, covering project status, budgets, new bylaws, etc. Jason gave a brief overview of our IIPC-funded Web Archive Profiling Via Sampling Project, which is now coming to a close. In addition to the resources and deliverables linked from the IIPC project page, Sawood Alam has developed the MemGator Memento Aggregator and the CDXJ format for serializing CDX files in json. We welcome feedback on both. I'd also like to repeat our request for web archiving logs so we can better model request patterns.
We had a introduction and Q&A from the Steering Committee members that worked well (I believe this was the first time this format had been used). The day closed with updates from Alex Thurman & Abbie Grotke about the collaborative collections, Sara Aubry about the proposed WARC 1.1 format, and Andy Jackson on "Building Tools to Archive the Modern Web".
Unfortunately Day 2 began with dual and triple tracks, so one was forced to make hard decisions about what to attend when they're all good. I began in the session with Andy Jackson covering "Building Better Tools, Together" in which he covered the benefits of open source development. The following session was had David Rosenthal, Nicholas Taylor, and Jefferson Bailey covering the IMLS-funded web archiving API project. The result of the session was a Google doc that contained the essence of the discussion. Their slides "Building API-Based Web Archiving Systems and Services" are now available (2016-05-02 edit).
After lunch, I presented in the session "Harvesting Tools", with Jefferson Bailey and Youssef Eldakar. Jefferson gave a preview of brozzler, a crawling package that combines real chrome browsers with warcprox for capturing all resources. Youssef gave a demo of visualizing Heritrix crawls. My talk closed the session and was based on Justin's work on crawling deferred representations and descendants (see the iPres 2015 paper and 2016 tech report for more information about these concepts, as well as Justin's PhD summary post).
The final session was by Martin Klein, Andrea Goethals, and Stephen Abrams on their plans for a submission to IMLS for nominating and coordinating seed URIs for crawls.
Wednesday began the IIPC Web Archiving Conference, and it kicked off with a keynote from Iceland's own Hjálmar Gíslason, most recently at DataMarket. He started off the keynote by defining the progression of "big data":
Drawing from his current position and previous positions, he made a number of interesting observations regarding what is worth archiving. Although "hoarding isn't a strategy", we frequently don't know in advance what will be valuable (e.g., the NY Times 1927 article that said "commercial use in doubt" regarding television). His slides aren't posted yet, but hopefully soon.
After that was a joint presentation from Vint Cerf and Rick Witt from Google, who is now an IIPC member (!). Vint rightly noted that the IIPC crowded didn't need the usual background material he typically provides (cf. DSHR's and my reaction to his 2015 AAAS talk). Rick focused on potential roles for Google in the IIPC and web archiving in general:
potential roles for @google wrt #digitalpreservation shared by @richardswhitt: convener, financier, vendor, lobbyist, advocate #iipcWAC16— Nicholas Taylor (@nullhandle) April 13, 2016
Vint was only able to be there for part of the day on Wednesday, but Rick was there the whole time. Rick was careful to stress that Google was there to learn and assess, not to try to steer or dominate the community. However, it is fair to say that the IIPC members that I spoke to were all very excited about Google's recognition of web archiving, even if no specific strategy or plan is adopted. The Q&A after their presentation was quite lively and could have gone on much longer. Brewster Kahle then moderated a panel about web archiving from the perspective of National Libraries with: Helen Hockx-Yu (IA, formerly of the British Library), Steve Knight (New Zealand), and Paul Koerbin (Australia).
I had to leave after lunch, so I missed the remainder of the conference. Rounding out Wednesday was David Rosenthal's "The Architecture of Emulation on the Web", Ilya Kremer & Dragan Espenschied presenting on oldweb.today (netcapsule github), Thomas Liebetraut talked about emulation (bw_FLA), and Matthew S. Weber and Ian Milligan talked about their Hackathons (Canada in March, US in June). Brewster concluded the day with a keynote "20 Years of Web Archiving – What Do We Do Now?" He previewed a really cool experimental interface for the Wayback Machine:
Woah. I’m mousing over these. Finding major changes, seeing who crawled them. This is BIG. #iipcWAC16 pic.twitter.com/WV9BssXSI4— Ian Milligan (@ianmilligan1) April 13, 2016
I won't even try to summarize Thursday's sessions, and Friday consisted of a couple of different workshops. The Twitter hashtags were #IIPCGA2016 and #IIPCWAC2016, respectively. Ed Summers has a nice page summarizing all the tweets for both events. Kristinn Sigurðsson, who did a great job organizing the event, has a summary blog post for the event, and Peter Webster has a nice reflection piece about "What do we need to know about the archived web?" based on what he learned at IIPC. I'll add more posts about the event as I discover them.
As always, the IIPC meeting was excellent -- I highly encourage you attending if you are at all interested in web archiving. Next year's IIPC General Assembly and Web Archiving Conference will be in Lisbon, Portugal, in late March.
2016-04-25 edit: Blog post (in French) from the new IIPC chair, Emmanuelle Bermès.
2016-05-02 edit: Blog post from Patrick Galligan.
2016-05-12 edit: A "lessons learned" post about organizing the event from Kristinn Sigurðsson.
2016-05-14 edit: A broad perspectives blog post from Nicholas Taylor.