The Library of Congress funded research project "Tools for a Preservation Ready Web" is coming to a close. The initial phase (2007-2008) of the project funded Joan Smith's PhD research into using the web server to inform web crawlers exactly how many valid URIs there are at a web site (the "counting problem") as well as perform server-side generation of preservation metadata at dissemination time (the "representation problem"). Several interesting papers came out of that project (e.g., WIDM 2006, D-Lib 14(1/2)) as well as the mod_oai Apache module. Joan graduated in 2008 and is now the Chief Technology Strategist for the Emory University Libraries and an adjunct faculty member in the CS department at Emory.
Since that time, Herbert and I (plus our respective teams) have been closing out this project working on some further ideas regarding the preservation of web pages and how web archives can be integrated with the "live web". The result is the Memento Project, which has a few test pages that are collecting links from robots and interactive users that we will use in a description and analysis to be published shortly. In the mean time, the test pages feature some clever scripting from Rob to show Herbert and I standing next to BBC and CNN web pages, respectively. Check them out:
And here are their respective bit.ly URIs (just for fun):
I'll post a further update on WS-DL when we publish the description of how Memento works. We'd like to again thank the National Digital Information Infrastructure and Preservation Program for their support of the "Tools for a Preservation Ready Web" project.