Thursday, July 21, 2011

2011-07-25: NDSA/NDIIPP Partner Meetup 2011 Trip Report

The NDSA/NDIIPP (@ndiipp) Partner Meetup took place July 19-21 at the Hyatt Regency Washington on Capitol Hill in Washington, DC. Technical and non-technical joined together to form an aggregated consortium of archivists, librarians, digital media specialists and concerned parties. Three representatives from the ODU Web Sciences and Digital Libraries group attended to make archivists aware of tools they had developed to accomplish the common goal of web archiving.
WS-DL’s Comtributions to the NDSA/NDIPP Meetup
Mat Kelly presented the Mozilla Firefox add-on Archive Facebook to a breakout group of presentations specifically targeting web archiving. The redesigned and re-architected add-on allows a user to archive the content of his/her Facebook account with the result being truly WYSIWYG versus Facebook’s native offerings of a content dump.
Vivens Ndatinya showed the workings of a tool he is currently building with his presentation, “Creating Persistent Links to YouTube Music Videos”. The software serves as a medium between a user and YouTube where, if a music video has been deleted or removed, the proxy will search for a comparable or official substitute and seamlessly forward the user to the resource for which he/she was looking.
Michael Nelson presented "How Much of the Web is Archived?", which was also presented at JCDL 2011. By examining links on DMOZ, delicious, bit.ly and search engines and cross-referencing the links with various archives, they were able to establish the criteria for likelihood of archival rate and conclude the amount of the web that is archived with, "It depends on the source of the URIs".

The Speakers
Martha Anderson (@MarthaBunton), the director of program management for the National Digital Information Infrastructure and Preservation Program (NDIIPP), exclaimed that “We are growing!” in her introductory presentation, citing the increasing numbers of members in the group and the larger breadth of the scope of the members’ specializations. “ She introduced the theme of the conference "Make It Work" and stated that the conference’s 3 days were broken up by the respective keywords of “Open”, in that all presenters were committed to openness, “Solve”, where all speakers presented studies on creative approaches toward solving their problems and “Connect”, which had a focus on community building and relationships.
Tim O’Reilly (@timoreilly), founder and CEO of O’Reilly Media Inc., kicked off the list of speakers providing insightful one-liners such as “Forgetting makes room for new things”, “Design more systems that have their own memory” and “We’re engaged in the wholesale destruction of our history”. He listed two of his pasts failures where his process of archiving could have been improved:
In 1993 he created one of the first websites but neglected to archive it. “Things that turn out to be historic”, he stated, “aren’t deemed to be historical at the time” – a theme that reverberated through many other presentations.
His second past failure of preservation was in 1998 when he attended the inaugural Open Source Summit (link?), where the term “Open Source” was officially born. Learning from his 1993 failure, he diligently built an archive and linked to all of the relevant content but neglected to deep link the archiving, which meant all of the information that was coupled with his coverage was no longer available at time of access.
O’Reilly rhetorically queried the audience, “What kind of tools do we need in the everyday practice of the digital world to encourage presentation?” He stated that we have to consider the widely divergent scenarios if we are to archive effectively. He reiterated that the tools we have should be adapted to assure that it is more likely that archived would survive when things went awry. “What matters?”, Tim stated, again referencing his two failures and answering his own question. He emphasized that our current perspective of what matters is temporally subjective and that we are likely neglecting to archive collections we now consider trivial.
To close up, Tim emphasized that there should be an exception in copyright for the sake of archiving so that our past will be preserved.
Yancey Strickler (@ystrickler) came on next to speak about his project Kickstarter, a funding platform for creative projects. Kickstarter works on an all-or-nothing approach of fund-raising where users can offer monetary support for projects they believe worthwhile with no commitment if the project fails to get funded. Yancey spoke of a tipping point in the funding process, where a large majority get funding to and sometimes beyond the threshold after attain 30% of their goal. Those that donate to the project are forbidden from being rewarded with equity but the fundee usually provides something priceless in return, like a photo for a donator from a project where a girl wished to sail around the world or the ability to be first to purchase a potentially popular iPod accessory that neglected to get traditional backing.
Kickstarter takes only a very small (5% of the raised funds) to remain sustainable but only receives these if the project gets funded. With this, Kickstarter and the projects both grow. “One day”, Yancey said, “we’ll hopefully be a cultural institution”.
Michael Edson (@mpedson) of the Smithsonian Institution came on next after a short break with his presentation “Let us go boldly into the present”. Michael emphasized that the time to archive is now and that “today is the future that all of the visionaries wrote about.” To do so, he gave five “design patterns” that we should exhibit to assure that the present is archived:
  1. Extra-terrestrial Space Auditor is a concept best depicted by an extraterrestrial that examines an organization, blind to its current workings, and provokes the organization to do a self-analysis as to whether it is performing as it should in terms of business practices, HR, etc having been potentially skewed in operation by the baggage of the last epoch.
  2. On Ramp and Loading Docks encourages the mindset that successful preservation is not about building infrastructure but rather creating movement.
  3. Edge to Core suggests that the best work is done on the fringes of an institution where subject matter experts exist. “An organization”, Michael said, “should develop a process that brings in and bootstraps these experts so their ideas can scale.”
  4. Self Awareness about organization change patterns states that there are predictable miscommunications and general crankiness in an organization between innovators and managements.
  5. Focus on the mission was Michael’s observation that of the 80 to 90 organization that he had spoken to in the last few years, the ones that were not suffering their pursuit of worth know the outcomes they want in society.
After Michael, Aaron Presnall (blog) of the Jefferson Institute came on to speak about “Tools for informing public decision-making”. A continuing project of his was to assist those at the National Archives of Serbia in archiving their documents. Many of these documents are of great interest, as they document the recent struggles and secession of the country and have immediate application (such as implication for war crimes) if preserved. Using the tools available, some unconventional, Aaron assisted those interested in moving from the dissolving stacks of papers to a digital form. He then built a management tool and genericized the tool to allow it to be reused in instances beyond Serbia. He has since been queried by Bosnia, who wishes to do the same as Serbia and because of the generic setup Aaron has created, the information Bosnia has to offer will not be lost.
With Aaron being the last presenter for the day, Abby Rumsey moderated a panel discussion/Q&A with all of the Day One speakers., first hoping to address Martha's question, "How do we make it work?" She first asked Aaron how to connect demand of archiving with the supply of skill and if there is something that needs to be in-place to make these connections easier. He replied with the need to communicate the success of individual cases to much broader audience, convey the lessons learned and establish best practices for performing such an archiving session. He admitted that it's difficult "to make archiving sexy" but popularized projects such as History Pin get people thinking and both energize and popularize the task of archiving.
Tim O'Reilly expounded on Aaron's reply, referencing a collection of railway edition books from the 1880s that were bounded by people that found the works both valuable and beautiful. "When some individual finds something that would otherwise be disposable and finds it beautiful and a keepsake", Tim said, "that's a wonderful impulse for preservation". He continued, "When we allow things to be reused by individuals, it really appeals to value of fair use." He went on to speak about how intellectual property fights against preservation and what we can do to preserve things of value is to give them more freedom.
Abby then questioned Michael Edson about how his approach of Edge-to-Core has had an impact on The Smithsonian. Michael gave the example of how the Smithsonian handled the inception of the world-wide web with no business process in-place. "Because the institution took a decentralized approach to managing content and ideas", he said, "there was no existing infrastructure to make order out of the web. It's been a series of opportunistic efforts to pick the pieces of the low hanging fruit and bring them to the center of the organization to achieve scalability and a greater impact."
Yancey was then asked, "How do you get something where the connections are so profoundly personal into something that really scale to the level we think about with digital preservation?", citing Wikipedia's scaling issues. Yancey alluded to Wikipedia's moderation challenges in terms of curation with, "What happened if I'm a guy that knows a lot about a topic you're concerned with archiving and I decide to reach out and tell you everything I know and all of the ways to be wrong? What do I get to contribute? Do I have any voice whatsoever?" Aaron replied with, "Exactly, that's a tremendous challenge and whether 80% of time you're right, 20% of time you could be fundamentally, deeply, troublingly wrong."
The Q&A was followed up with a reception accompanied with 30-or-so poster displays. Of particular interest to the WS-DL members was the Ace Audit Manager and Integrity Management System, an integrity auditing system for archives, which would prove useful in both the Memento and Archive Facebook projects. This closed out day one.
Day two started with a presentation from Helen Hockx-Yu from the British Library. "In the UK", she said, "there are tow archives - The UK web archive and the UK Government Web Archive." She spoke further that there was pending legislation that would limit the viewing of archives to on-site within the library. "Web archiving in the UK", she said "is only 10 years old at the British library - much younger than Internet Archive." One notable part of the collection, to which she said the British Library found accidentally, is the oldest archived website - that of the British Library's website from 1995, which was found stored away on a library's server.
Tricia Cruse of the California Digital Library spoke next about "Curation approaches in a public university system", stating "We're seeing an ever-increasing amount and degree of diversity of content. While our budgets were going down, we have had to do more with less."She also spoke of EZID, a system for users to create unique identifiers for their archived content; UC3 Merritt, a place where collaboration for researchers can happen and data can be stored and shared and Digital Curation for Excel (DCXL), an open source Microsoft Excel add-in that allows working in Excel to be easier for versioning, archiving and applying unique identifiers.
Jack Brighton of WILL, a radio/television station in Illinois, spoke of "Archiving at Web Speed". Jack spoke of his efforts in preserving the stations broadcasts using PBCore and emphasized the need for the adaptation of the archiving process to make it as painless as possible for those that did not necessarily see the value of the content at the current time.

Ben Vershbow (@subsublibrary) from the New York Public Library finished up the first session of the day with his presentation, "Bringing in the Crowd". He cited a project his group created, "What's on the Menu?", which was a crowd-sourced effort to transcribe old menus. He believes that there is an untapped reservoir of time and through crowd coordination and building datasets, people will be willing to devote their time for free.
Subsequent to Ben's presentation, the crowd broke up into three groups for workshops. The three topics of the workshops were "And the winner is..: How does a community recognize achievement?", "Tales from the crypt: What are the emerging practices of large scale storage" and "Special Interest Session: Web Archiving: Pecha Kucha and discussion of emerging topics in Web archiving". Because Vivens and Mat presented at the latter of the three, the WS-DL members attended and participated in the third session.
Presentations resumed after the breakout session with the theme of Open Source Tools and Community. The first presentation was by MacKenzie Smith (website) of MIT with "Exhibit3@MIT: Lessons learned from 10 years of the Simile Projet for building library open source software". MacKenzie stated that "Everybody's a curator" and "If we're creating these tools for the public, how can we assure that these tools will flow into the organizations, as many die? When you're doing a project that's open source", he continued, "you need to design for that community from the beginning." MacKenzie went on to say that metrics should be used to assure that you can tell the chance of success of the open source project, you're more likely to have a sustainable project if you have an audience "outside of this room" (i.e. outside of the archiving community) and that maintenance of the code has to be done by those that are committed, not just casual developers.
Sharon Leon (website) of George Mason University then presented "Omeka: from digital exhibits to web publishing platform". Omeka is a plug-in based Content Management System (CMS) modeled off of Wordpress that emphasizes extensibility. Sharon repeatedly emphasized the openness of the platform and that her group "specifically fights against Flash for re-use", as wrapping content in a Flash-based application limits access to the content within. She also mentioned that in developing a grant-funded open source project, one should not spend all of the funds on the development of the project but rather should put funds toward workshops, outreach and marketing of the product.
Michele Kimpton (website) spoke of ways to go beyond grant funding once it's exhausted with "Building and sustaining open source communities through the life cycle: Dspace, Fedora and DuraCloud case studies". Her group has create a write-up on the Meetup.
Following Michele was another breakout session of concurrent workshops with each having the topics of "Tools at risk", "I can haz standardz" and "Developing cutting-edge internship programs in digital preservation: What are the essential elements?". The WS-DL group attended "I can haz standarz", which disappointingly was more about the inability of the non-technical in building a tool for data management rather than about the standards themselves. As the group were all of technical mind, this was clearly the wrong workshop of the three to attend.
After another short break was a third set of concurrent workshops: "Digital preservation in a box: What are the key resources for digital preservation and education and outreach?", "Slaying the dragons: What is at risk and how do we rescue it?" and "The Challenge challenge: What are ways we can spark digital preservation innovation". The WS-DL group attended the third of the three. There, the attendees were broken into groups with each group being tasked to discuss a single topic in-depth with varying concerns in each group. Unlike the previous workshop, one topic was specifically technical - that of investigating how one assures archive integrity from a host and how to go about performing an audit on the collections stored. The WS-DL group along with Michelle Gallinger (@mgallinger), Professor Micah Beck (website), Mike Smorul (@msmorul) and a couple others devised the Storage Ping concept, which would require those that host collections to enable a client induced check on the server's collection integrity.
Day 3 started out with an introduction by Martha Anderson and the followed with the first presenter, David Rosenthal (website) of Stanford University on "Cloud Storage for LOCKSS Boxes". LOCKSS (Lots Of Copies Keeps Stuff Safe) boxes are dedicated computers with local storage that communicate with each other and repair any damages of data. David discussed challenges of speed he encountered when developing his system and conveyed a method of assuring integrity of data and assurance of data's existence on a remote server by prepending a nonce. He has recently been working with students at Carnegie Melon University to develop a crawling process that he described as being "a pretty robust approach to form filling." He also expressed some difficulty he has had in the past with archiving AJAX-based contents but emphasized that his archiving process was different than others', as he does not use Heritrix, the crawler used by The Internet Archive.
After David, Cal Lee (website) of UNC Chapel Hill analyzed the four NDIIPP State projects:
  1. Persistent Digital Archives and Library System (PeDALS) by Arizona
  2. A Model Technological and Social Architecture for the Preservation of State Government Digital Information by Minnesota Historical Society
  3. Geomap (GIS Data headed by North Carolina for Center Geographic Information and Analysis)
  4. Multi-state Preservation Consortium by Washington State Archives
The questions he asked about each projects included:
  • What are the main factors that drove the project in the first place?
  • What brought these about?
  • Who was involved and why?
  • What were the activities they engaged in before this?

Following Cal was Robert Horton from the Minnesota Historical Society who presented his slide-less report of his NDIIPP-sponsored project. Cal spoke of a soon-to-be enacted uniform law for the preservation and authentication and access to electronic legislative records. The legislation will define the required usage of digital Signatures to sign all legislative content online.
Peter Krogh (@peterkrogh) of the American Society of Media Photographers spoke next with, "Extending the reach of www.dpBestflow.org". Peter had been investigating means of collaboration and methods to get people to archive by conveying the task of archiving in a way that will appeal to the would-be archivist.
After a break, summaries of the 2010 DPA finalists sponsored by the Library of Congress were presented. WS-DL's own Michael L. Nelson (website) reported on the Memento project (joint work with Herbert Van de Sompel (who gave the original presentation in London in December 2010) and Robert Sanderson of LANL) which was referenced multiple times by other presenters throughout the meetup. Dr. Nelson stated that there is currently a disconnect in viewing web archives, as there is no seamless way to go from the past and the present. Memento overcomes "being stuck in the perpetual now" by leveraging content that currently exists in the web archives and provides a bi-directional means to view different versions of a web site on-the-fly. Michael stated that Memento does not create web archives but rather puts the notion of time onto the web.
Following Michael's presentation was Fran Berman (website) of Rensselaer Polytechnic Institute with "Economics and Digital Preservation", a final report of the Blue Ribbon Task Force (BRTF), whose mission is to promote sustainable digital preservation and access. Fran spoke of BRTF's investigation of the technical, economical and social problem. "Infrastructure is not free", she said, "and the preservation and access to our data is not free. Because it is not free and because there are so many interesting solutions, you see it as a multivariate problem. " She stated that the Task Force wanted to do a deep dive into the economics of the problem of cost for digital preservation.
"Our charge was to do roughly three things", Fran enumerated:
  1. Assemble a representative group of experts with broad perspective and influence.
  2. Look at the problem space: how can we structure it and understand us in a way that helps us take action.
  3. Come up with actionable recommendations.
The BRTF created a report with it recommendations.
The final presentation of the conference was by Kari Kraus (@karikraus) of the University of Maryland with "Preserving Virtual Worlds" (Jerry McDonough gave the original presentation in London). Kari spoke of her attempts at preserving virtual worlds with repeatedly referencing example from Second Life. The project was a multi-institution, multi-disciplinary project by University of Illinois at Urbana-Champaign, Stanford University, Rochester Institute of Technology and The University of Maryland that investigating preserving virtual worlds for their aesthetic merit as well as their economic significance. "We believe there is tremendous cultural importance to these artifacts.", she said, "We believe video games represent the limit case of what we can do with digital preservation. If we can figure out how to save a classic first-person shooter game like Doom, we'll have a better chance of preserving computational simulations of genetic evolution or climate change or the galactic behavior of star systems."
She said that their mission was very practical: they needed to ingest game bits into institutional repositories and provide packaging standards for doing that. Other examples of virtual worlds she mentioned were investigated were Spacewar, Adventure (interactive fiction) and Mystery House (interactive fiction) among others.
In Closing
Neither of the WS-DL student presenters had presented at a meetup/conference of this caliber before, which made the experience more than worthwhile. Much was learned about the various efforts of the archiving community and WS-DL's projects gained exposure. Further, we were made aware of others' efforts and found some resources that we hope to integrate into our research in the near future.

— Mat Kelly

1 comment: