Our Contributions to JCDL 2011
Ahmed Alsum presented How Much of the Web is Archived? This paper approximates the amount of the Web that is archived using four URI sources. From this data, we observe significant variation in archival rate in URIs from different sources. So, how much of the web is archived? It depends on which web you mean. (pdf, slides).
Martin Klein presented Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures, which details a method for discovering missing web pages (the dreaded 404). Martin also demonstrated Synchronicity, a Firefox extension that uses lexical signatures (and other methods) for automatically rediscovering missing web pages in real time (pdf, slides).
Abdulla Alasaadi presented Persistent Annotations Deserve New URIs, which describes a method for creating new persistent URIs for annotations and creating persistent, independent
archived versions of all resources involved in the annotation (pdf, slides).
After the conference, the members of the Web Science and Digital Libraries team attended the Web Archive Globalization Workshop. This workshop focused current initiatives and future possibilities. Eric Hetzner provided insight into the California Digital Library's web archiving activities. The Library of Congress's Nicholas Taylor told us about the Library's digital preservation initiatives. Brad Tofel of the Internet Archive gave us the low down the future of web archive formats (ARC, WARC, WAT) and the Wayback Machine. Robert Sanderson with the LANL Research Library provided an overview of current Memento infrastructure. There was much discussion about current archiving challenges including management of huge volumes of information, copyright considerations, and the challenges of making the archives accessable to researchers and the public. (slides)
The workshop was organized by:
- Frank McCown, Harding University
- Hector Garcia-Molina, Stanford University
- Michael L. Nelson, Old Dominion University
- Andreas Paepcke, Stanford University
The opening keynote, "Leaving the Cathedral and Entering the Bazaar: Library and Archives Canada Engages Canada’s Digital Society," was given by Daniel J. Caron, the current Librarian and Archivist of Canada. Mr. Caron discussed the issues and opportunities faced by national libraries as they transition from an analog to digital environment. He compared the situation to the cultural and process differences put forward by Eric. S. Raymond in The Cathedral and the Bazaar. It was an excellent talk and I really got the impression that Mr. Caron understood the transition required and the chaos inherent with a technological change of this magnitude.
Wednesday's open talk was given by IBM's Joan Morris DiMicco. "Data Narratives: Telling Stories with Data" (slides) focused on current reasearch at IBM into data visualization as storytelling medium. She defined at story as concrete, temporal, purposeful, and emotional. Brief presentations of visualizing legislative text with Many Bills, SaNDVis social relationship search, and the impact of visualizations on group behavior Second Messenger.
Christopher R. Barnes, the director of NEPTUNE Canada, described the NEPTUNE Canada cabled ocean observatory using many wonderful illustrations and photographs. He then went on to describe the digital library problem he and his team face: the 4+ (and growing) gigabytes of data collected daily by the project. This data is used by over 8,000 user. Storage, cataloging, and access are ever growing challenges the digital library and preservation communities can help with.
Two or three session ran simultaneously durng the conference and I was not able to attend all presentations.
Session 1 presented automated methods to assist human understanding of texts. There were full papers on improving understanding of historical word sense variation (Measuring Historical Word Sense Variation) and improving information extraction from PDF books (Structure Extractions from PDF-based Book Documents); and a short paper on using syntactic dependency parse tree to learn expected patters between lexical arguments (Word Order Matters: Measuring Topic Coherence with Lexical Structure).
Session 5 explored rediscovery of missing web content, a topic near and dear to us. This session included two of our short papers and full papers on using patterns to efficiently implement web archiving (Archiving the Web Using Page Changes Patterns: A Case Study) and identifying academic home pages (On Identifying Academic Homepages for Digital Libraries).
The impact of copyright on access and use was covered in session 7. The attitudes of the social-media savvy were explored (The Ownership and Reuse of Visual Media) and the implications of data quality problems in national bibliographies were explored in (Using National Bibliographies for Rights Clearance).
Session 8 looked at methods to annotate the Web. Rob Sanderson presented SharedCanvas (preprint, slides). There was also a paper on combining superimposed information with digital libraries (Use of Subimages in Fish Species Identification: A Qualitative Study). Our Persistent Annotations Deserve New URIs short paper was also presented in this session.
Session 11 and 12 looked at the needs and abilities of user and improving the digital library experience. Understanding Digital Library Adoption: A Use Diffusion Approach and In the Bookshop: Examining Popular Search Strategies studied how users interact with digital libraries. Improving recommendations was looked at from several perspectives (A Social Network-Aware Top-N Recommender System using GPU, Serendipitous Recommendation for Scholarly Papers Considering Relations Among Researchers, and Product Review Summarization from a Deeper Perspective).
Other Perspectives on JCDL 2011
- Heather's Darkroom has good descriptions of the first two keynotes and the The Ownership and Reuse of Visual Media paper.
- Kayleigh Ayn Bohémier has a 4-part post (part 1) on the conference and the conference experience in Ottawa.
- Some of the presentation slides are on the slideshare JCDL 2011 event page.
- The Digital Repositories Workshop slides are also available.