2015-11-06: iPRES2015 Trip Report
From November 2nd through November 5th, Dr. Nelson, Dr. Weigle, and I attended the iPRES2015 conference at the University of North Carolina Chapel Hill. This served as a return visit for Drs. Nelson and Weigle; Dr. Nelson worked at UNC through a NASA fellowship and Dr. Weigle received her PhD from UNC. We also met with Martin Klein, a WS-DL alumnus now at the UCLA Library. While the last ODU contingent to visit UNC was not so lucky, we returned to Norfolk relatively unscathed.
Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern's keynote address delivered with Leo Konstantelos and Maureen Pennock. This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento utility. The responses are available on Google Docs.
I attended the Institutional Opportunities and Challenges session to open the conference. Kresimir Duretec presented "Benchmarks for Digital Preservation Tools." His presentation touched on how we can get digital preservation tools that "Just Work", including benchmarks for evaluating tools on test beds and measuring them for quality. Related to this is Mat Kelly's work on the Archival Acid Test.
Alex Thirifays presented "Towards a Common Approach for Access to Digital Archival Records in Europe." This paper touched on user access: user needs, best practices for identifying requirements for access, and a capability gaps analysis of current tools versus user needs.
"Developing a Highly Automated Web Archive System Based
on IIPC Open Source Software" was presented by Zhenxin Wu. Her paper outlined a framework of open source tools to archive the web using Heritrix and a SOLR index of WARCS with an enhanced interface.
Barbara Sierman closed the session with her presentation "Best Until ... A National Infrastructure for Digital Preservation in the Netherlands" focusing on user accessibility and organizational challenges as part of a national strategy for preserving digital and cultural Dutch heritage.
After lunch, I lead off the Infrastructure Opportunities and Challenges session with my paper on Archiving Deferred Representations Using a Two-Tiered Crawling Approach. We defined deferred representations as those that rely on JavaScript to load embedded resources on the client. We show that archives can use PhantomJS to create a 1.5 times larger crawl frontier than Heritrix itself, but PhantomJS crawls 10.5 times slower. We recommend using a classifier to recognize deferred representations and only use it to crawl deferred representations, mitigating the crawl slow-down while still reaping the benefits of the headless crawler.
Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern's keynote address delivered with Leo Konstantelos and Maureen Pennock. This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento utility. The responses are available on Google Docs.
— Michael L. Nelson (@phonedude_mln) November 3, 2015
I attended the Institutional Opportunities and Challenges session to open the conference. Kresimir Duretec presented "Benchmarks for Digital Preservation Tools." His presentation touched on how we can get digital preservation tools that "Just Work", including benchmarks for evaluating tools on test beds and measuring them for quality. Related to this is Mat Kelly's work on the Archival Acid Test.
Another web archiving tool comparison from @WebSciDL -- archival acid test #ipres2015 https://t.co/oNEJgJHXyy
— Justin F Brunelle (@justinfbrunelle) November 3, 2015
Alex Thirifays presented "Towards a Common Approach for Access to Digital Archival Records in Europe." This paper touched on user access: user needs, best practices for identifying requirements for access, and a capability gaps analysis of current tools versus user needs.
"Developing a Highly Automated Web Archive System Based
on IIPC Open Source Software" was presented by Zhenxin Wu. Her paper outlined a framework of open source tools to archive the web using Heritrix and a SOLR index of WARCS with an enhanced interface.
Barbara Sierman closed the session with her presentation "Best Until ... A National Infrastructure for Digital Preservation in the Netherlands" focusing on user accessibility and organizational challenges as part of a national strategy for preserving digital and cultural Dutch heritage.
After lunch, I lead off the Infrastructure Opportunities and Challenges session with my paper on Archiving Deferred Representations Using a Two-Tiered Crawling Approach. We defined deferred representations as those that rely on JavaScript to load embedded resources on the client. We show that archives can use PhantomJS to create a 1.5 times larger crawl frontier than Heritrix itself, but PhantomJS crawls 10.5 times slower. We recommend using a classifier to recognize deferred representations and only use it to crawl deferred representations, mitigating the crawl slow-down while still reaping the benefits of the headless crawler.
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling Approach from Justin Brunelle
Day 2 concluded with the poster session (including a poster by Martin Klein) and reception.
Pam Samuelson opened Day 3 with her keynote Mass Digitization of Cultural Heritage: Can Copyright Obstacles Be Overcome? Her keynote touched on the challenges with preserving cultural heritage introduced by copyright, along with some of the emerging techniques to overcome the challenges. She identified duration of copyright as a major contributor to the challenges of cultural preservation. She notes that most countries have exceptions for libraries and archives for preservation purposes, and explains recent U.S. evolutions in fair use through the Google Books rulings.
After Samuelson's keynote, I concluded my iPRES2015 visit and explored Chapel Hill, including a visit to the Old Well (at the top of this post) and an impromptu demo of the pit simulation. It was very scary.
Several themes emerged from iPRES2015, including an increased emphasis on web archiving and a need to improved context, provenance, and access for digitally preserved resources. I look forward to monitoring the progress in these areas.
Douglas Thain followed with his presentation on "Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?" Similar to our work with deferred representations, his work focuses on scientific replay of simulations and software experiments. He presents several tools as part of a framework for preserving the context of simulations and simulation software, including dependencies and build information.
Hao Xu presented "A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System". His presentation focuses on a construction of a finite state machine to understand whether a repository is following compliance policies for auditing purposes.
Jessica Trelogan and Lauren Jackson presented their paper Preserving an Evolving Collection: "“On-The-Fly” Solutions for the Chora of Metaponto Publication Series." They discussed the storage of complex artifacts of ongoing research projects in archeology with the intent of improving sharability of the collections.
To wrap up Day 1, we attended a panel on Preserving Born-Digital News consisting of Edward McCain, Hannah Sommers, Christie Moffatt, Abigail Potter (moderator), Stéphane Reecht, and Martin Klein. Christie Moffatt identified the challenges with archiving born-digital news material, including the challenges with scoping a corpus. She presented their case study on the Ebola response. Stéphane Reecht presented the work by the BnF regarding their work to perform massive, once-a-year crawls as well as selective, targeted daily crawls. Hannah Sommers provided insight into the culture of a news producer (NPR) on digital preservation. Martin Klein presented SoLoGlo (social, local, and global) news preservation, including citing statistics about the preservation of links shortened by the LA Times. Finally, Edward McCain discussed the ephemeral nature of born-digital news media, and provided examples of the sparse number of mementos in news pages in the Wayback Machine.
Hao Xu presented "A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System". His presentation focuses on a construction of a finite state machine to understand whether a repository is following compliance policies for auditing purposes.
Jessica Trelogan and Lauren Jackson presented their paper Preserving an Evolving Collection: "“On-The-Fly” Solutions for the Chora of Metaponto Publication Series." They discussed the storage of complex artifacts of ongoing research projects in archeology with the intent of improving sharability of the collections.
To wrap up Day 1, we attended a panel on Preserving Born-Digital News consisting of Edward McCain, Hannah Sommers, Christie Moffatt, Abigail Potter (moderator), Stéphane Reecht, and Martin Klein. Christie Moffatt identified the challenges with archiving born-digital news material, including the challenges with scoping a corpus. She presented their case study on the Ebola response. Stéphane Reecht presented the work by the BnF regarding their work to perform massive, once-a-year crawls as well as selective, targeted daily crawls. Hannah Sommers provided insight into the culture of a news producer (NPR) on digital preservation. Martin Klein presented SoLoGlo (social, local, and global) news preservation, including citing statistics about the preservation of links shortened by the LA Times. Finally, Edward McCain discussed the ephemeral nature of born-digital news media, and provided examples of the sparse number of mementos in news pages in the Wayback Machine.
Marin Klein from UCLA Libraries speaking about the SoLoGlo social media project #ipres2015 #savenews pic.twitter.com/nWf6sXQX1a
— Edward McCain (@e_mccain) November 3, 2015
To kick off Day 2, Lisa Nakamura gave her opening keynote The Digital Afterlives of This Bridge Called My Back: Public Feminism and Open Access. Her talk focused on the role of Tumblr in curating and sharing a book no longer in print as a way to open the dialogue on the role of piracy and curation in the "wild" to support open access and preservation.
I attended the Dimensions of Digital Preservation session, which began with Liz Lyon's presentation on "Applying Translational Principles to Data Science Curriculum Development." Her paper outlines a study to help revise the University of Pittsburgh's data science curriculum. Nora Mattern took over the presentation to discuss the expectations of the job market to identify the skills required to be a professional data scientist.
Elizabeth Yakel presented "Educational Records of Practice: Preservation and Access Concerns." Her presentation outlined the unique challenges with preserving, curating, and making available educational data. Education researchers or educators can use these resources to further their education, reuse materials, and teach the next generation of teachers.
Emily Maemura presented "A Survey of Organizational Assessment Frameworks in Digital Preservation." She presented the results of a survey focusing on frameworks for assessment models, drawing conclusions like software maturity models do for computer scientists. Further, her paper identifies trends, gaps, and models for assessment.
Matt Schultz, Katherine Skinner, and Aaron Trehub presented "Getting to the Bottom Line: 20 Digital Preservation Cost Questions." Their questions help institutions evaluate cost, including questions about storage fees, support, business plans, etc. to help institutions assess their approach to taking on digital preservation.
After lunch, I attended the panel on Long Term Preservation Strategies & Architecture: Views from Implementers consisting of Mary Molinaro (moderator), Katherine Skinner, Sibyl Schaefer, Dave Pcolar, and Sam Meister. Sibyl Schaefer lead off with a presentation of details on Chronopolis and ACE audit manager. Dave Pcolar followed by presenting the Digital Preservation Network (DPN) and their data replication policies for dark archives. Sam Meister discussed the BitCurator Consortium which helps with the acquisition, appraisal, arrangement and descriptions, and access of archived material. Finally, Katherine Skinner presented the MetaArchive Cooperative and their activities teaching institutions to perform their own archiving, along with other statistics (e.g., the minimum number of copies to keep stuff safe is 5).
I attended the Dimensions of Digital Preservation session, which began with Liz Lyon's presentation on "Applying Translational Principles to Data Science Curriculum Development." Her paper outlines a study to help revise the University of Pittsburgh's data science curriculum. Nora Mattern took over the presentation to discuss the expectations of the job market to identify the skills required to be a professional data scientist.
Elizabeth Yakel presented "Educational Records of Practice: Preservation and Access Concerns." Her presentation outlined the unique challenges with preserving, curating, and making available educational data. Education researchers or educators can use these resources to further their education, reuse materials, and teach the next generation of teachers.
Emily Maemura presented "A Survey of Organizational Assessment Frameworks in Digital Preservation." She presented the results of a survey focusing on frameworks for assessment models, drawing conclusions like software maturity models do for computer scientists. Further, her paper identifies trends, gaps, and models for assessment.
Matt Schultz, Katherine Skinner, and Aaron Trehub presented "Getting to the Bottom Line: 20 Digital Preservation Cost Questions." Their questions help institutions evaluate cost, including questions about storage fees, support, business plans, etc. to help institutions assess their approach to taking on digital preservation.
After lunch, I attended the panel on Long Term Preservation Strategies & Architecture: Views from Implementers consisting of Mary Molinaro (moderator), Katherine Skinner, Sibyl Schaefer, Dave Pcolar, and Sam Meister. Sibyl Schaefer lead off with a presentation of details on Chronopolis and ACE audit manager. Dave Pcolar followed by presenting the Digital Preservation Network (DPN) and their data replication policies for dark archives. Sam Meister discussed the BitCurator Consortium which helps with the acquisition, appraisal, arrangement and descriptions, and access of archived material. Finally, Katherine Skinner presented the MetaArchive Cooperative and their activities teaching institutions to perform their own archiving, along with other statistics (e.g., the minimum number of copies to keep stuff safe is 5).
Day 2 concluded with the poster session (including a poster by Martin Klein) and reception.
— Justin F Brunelle (@justinfbrunelle) November 4, 2015
Pam Samuelson opened Day 3 with her keynote Mass Digitization of Cultural Heritage: Can Copyright Obstacles Be Overcome? Her keynote touched on the challenges with preserving cultural heritage introduced by copyright, along with some of the emerging techniques to overcome the challenges. She identified duration of copyright as a major contributor to the challenges of cultural preservation. She notes that most countries have exceptions for libraries and archives for preservation purposes, and explains recent U.S. evolutions in fair use through the Google Books rulings.
After Samuelson's keynote, I concluded my iPRES2015 visit and explored Chapel Hill, including a visit to the Old Well (at the top of this post) and an impromptu demo of the pit simulation. It was very scary.
Several themes emerged from iPRES2015, including an increased emphasis on web archiving and a need to improved context, provenance, and access for digitally preserved resources. I look forward to monitoring the progress in these areas.
--Justin F. Brunelle
Comments
Post a Comment