Showing posts from October, 2009

2009-10-26: Communications of the ACM Article Published

The article " Why Websites Are Lost (and How They're Sometimes Found) " has finally been published in the November 2009 issue of Communications of the ACM . Co-written with Frank McCown and Cathy Marshall , it was accepted for publication in the fall of 2007. Although we've had a pre-print available since 2008, it just isn't the same until you see it in print. Except we won't be seeing this in print; it is instead published in the "Virtual Extension" part of the CACM. So even though it has page numbers (pp. 141-145), this article won't be among those that arrive in your mailbox in a few weeks. As someone who has spent his entire career trying to transform the scholarly communication process with the web and digital libraries I completely understand this move by the CACM, but I have to admit I'm disappointed that I won't see a printed, bound copy. Even though in the long-term, all discovery will come from the web (e.g., Google Scho

2009-10-15: Seminars at Emory University

I recently traveled to Emory University to visit with Joan Smith (an alumna of our group -- PhD, 2008) and Rick Luce . While there, I gave two colloquiums: on October 1 at the Woodruff Library on OAI-ORE , and on October 2 at the Mathematics & Computer Science Department on web preservation (specifically, based on Martin Klein 's PhD research). I've uploaded both sets of slides. The first, "OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project", is based on slides from Herbert Van de Sompel : OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project from Michael Nelson The second, "(Re-) Discovering Lost Web Pages", is an extended version of slides presented at the NDIIPP Partners Meeting this summer: (Re-) Discovering Lost Web Pages from Michael Nelson --Michael 2020-01-23 Edit: updated embed code for SlideShare.

2009-10-05: Web Page for the Memento Project Is Available

The Library of Congress funded research project " Tools for a Preservation Ready Web " is coming to a close. The initial phase (2007-2008) of the project funded Joan Smith 's PhD research into using the web server to inform web crawlers exactly how many valid URIs there are at a web site (the "counting problem") as well as perform server-side generation of preservation metadata at dissemination time (the "representation problem"). Several interesting papers came out of that project (e.g., WIDM 2006 , D-Lib 14(1/2) ) as well as the mod_oai Apache module. Joan graduated in 2008 and is now the Chief Technology Strategist for the Emory University Libraries and an adjunct faculty member in the CS department at Emory. Since that time, Herbert and I (plus our respective teams) have been closing out this project working on some further ideas regarding the preservation of web pages and how web archives can be integrated with the "live web".