Posts

2015-02-17: Reactions To Vint Cerf's "Digital Vellum"

Image
Don't you just love reading BuzzFeed -like articles, constructed solely of content embedded from external sources?  Yeah, me neither.  But I'm going to pull one together anyway. Vint Cerf generated a lot of buzz last week when at an AAAS meeting he gave talk titled " Digital Vellum ".  The AAAS version, to the best of my knowledge, is not online but this version of "Digital Vellum" at CMU-SV from earlier the same week is probably the same. The media (e.g., The Guardian , The Atlantic , BBC ) picked up on it, because when Vint Cerf speaks people rightly pay attention.  However, the reaction from archiving practitioners and researchers was akin to having your favorite uncle forget your birthday, mostly because Cerf's talk seemed to ignore the last 20 or so years of work in preservation.  For a thoughtful discussion of Cerf's talk, I recommend David Rosenthal's blog post .  But let's get to the BuzzFeed part... In the wake of the med

2015-02-17: Fixing Links on the Live Web, Breaking Them in the Archive

Image
On February 2nd, 2015, Rene Voorburg announced the JavaScript utility robustify.js . The robustify.js code, when embedded in the HTML of a web page, helps address the challenge with link rot by detecting when a clicked link will return an HTTP 404 and uses the Memento Time Travel Service to discover mementos of the URI-R. Robustify.js assigns an onclick event to each anchor tag in the HTML. The event occurs, robustify.js makes an Ajax call to a service to test the HTTP response code of the target URI. When an HTTP 404 response code is detected by robustify.js, it uses Ajax to make a call to a remote server, uses the Memento Time Travel Service to find mementos of the URI-R, and uses a JavaScript alert to let the user know that JavaScript will redirect the user to the memento. Our recent studies have shown that JavaScript -- particularly Ajax -- normally makes preservation more difficult, but robustify.js is a useful utility that is easily implemented to solve an importan

2015-02-05: What Did It Look Like?

Image
Having often wondered why many popular videos on the web are time lapse videos (that is videos which capture the change of a subject over time), I came to the conclusion that impermanence gives value to the process of preserving ourselves or other subjects in photography. As though a means to defy the compulsory fundamental law of change. Just like our lives, one of the greatest products of human endeavor, the World Wide Web, was once small, but has continued to grow. So it is only fitting for us to capture the transitions. What Did It Look Like? is a Tumblr blog which uses the Memento framework  to poll various public web archives, take the earliest archived version from each calendar year, and then create an animated image that shows the progression of the site through the years. To seed the service we randomly chose some web sites and processed them (see also the archives ). In addition, everyone is free to nominate web sites to What Did It Look Like?  by tweeting :

2015-01-15: The Winter 2015 Federal Cloud Computing Summit

Image
On January 14th-15th, I attended the  Federal Cloud Computing Summit  in Washington, D.C., a recurring event in which I have participated in the past. In my continuing role as the MITRE-ATARC Collaboration Session lead, I assisted the host organization, the  Advanced Technology And Research Center  (ATARC) in organizing and run the MITRE-ATARC Collaboration Sessions. The summit is designed to allow Government representatives to meeting and collaborate with industry, academic, and other Government cloud computing practitioners on the current challenges in cloud computing. The collaboration sessions continue to be highly valued within the government and industry. The Winter 2015 Summit had over 400 government or academic registrants and more than 100 industry registrants. The  whitepaper summarizing the Summer 2014 collaboration sessions  is now available. A discussion of  FedRAMP  and the future of the policies was held in a Government-only session at 11:00 before the collabora

2015-01-03: Review of WS-DL's 2014

Image
The Web Science and Digital Libraries Research Group's 2014 was even better than our 2013 .  First, we graduated two PhD students and had many other students advance their status: Ahmed AlSum defended his Ph.D. on February 26, 2014 and joined the Stanford University Libraries after his defense.  It was Ahmed that started the WSDL tradition of the successful candidate providing the celebratory lunch (shown in the above picture, as well as several photos from our new WSDL Flickr Photostream ).  Chuck Cartledge defended his Ph.D. on May 30, 2014 (he already had a position with Fulcrum ).  Since Chuck finished after Ahmed, he was responsible for lunch as well. Justin Brunelle passed his candidacy exam .   Yasmin AlNoamany passed her candidacy exam . Hany SalahEldeen passed his candidacy exam . Mohamed Aturban passed his breadth exam . Louis Nguyen passed his breadth exam .   Corren McCoy passed her breadth exam.  Alexander Nwala joined WSDL after completing

2014-12-20: Using Search Engine Queries For Reliable Links

Image
Earlier this week Herbert brought to my attention Jon Udell 's blog post about combating link rot by crafting search engine queries to "refind" content that periodically changes URIs as the hosting content management system (CMS) changes. Jon has a series of columns for InfoWorld , and whenever InfoWorld changes their CMS the old links break and Jon has to manually refind all the new links and update his page.  For example, the old URI: http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html is currently: http://www.infoworld.com/article/2660595/application-development/xquery-and-the-power-of-learning-by-example.html The same content had at least one other URI as well, from at least 2009--2012: http://www.infoworld.com/d/developer-world/xquery-and-power-learning-example-924 The first reaction is to say InfoWorld should use " Cool URIs ", mod_rewrite , or even handles .  In fairness, Inforworld is still redirecting the second URI to the c

2014-11-20: Archive-It Partners Meeting 2014

Image
I attended the 2014 Archive-It Partners Meeting in Montgomery, AL on November 18.  The meeting attendees are representatives from Archive-It partners with interests ranging from archiving webpages about art and music to archiving government webpages.  (Presentation slides are now available on the Archive-It wiki .)  This is ODU's third consecutive Partners Meeting (see trip reports from 2012  and  2013 ). The morning program was focused on presentations from partners who are building collections.  Here's a brief overview of each of those. Penny Baker and Susan Roeper from the Clark Art Institute talked about their experience in archiving the 2013 Venice Biennale international art exhibition ( Archive-It collection ) and plans for the upcoming exhibition.  Their collection includes exhibition catalogs, monographs, and press releases about the event.  The material also includes a number of videos (mainly from vimeo), which Archive-It can now capture. Beth Downs

2014-11-14: Carbon Dating the Web, version 2.0

Image
For over 1 year, Hany SalahEldeen's Carbon Date service has been out of service mainly because of API changes in some of the underlying modules on which the service is built upon. Consequently, I have taken up the responsibility of maintaining the service, beginning with the following now available in Carbon Date v2.0. Carbon Date v2.0 The Carbon Date service currently makes requests to the different modules (Archives, backlinks, etc.), in a concurrent manner through threading. The server framework has been changed from bottle server to CherryPy server which is still a python minimalist WSGI server, but a more robust framework which features a threaded server. How to use the Carbon Date service There are three ways: Through the website, http://cd.cs.odu.edu/ : Given that carbon dating is highly computationally intensive, the site should be used just for small tests as a courtesy to other users. If you have the need to Carbon Date a large number of URLs, y

2014-11-09: Four WS-DL Classes for Spring 2015

Image
We're excited to announce that four Web Science & Digital Library ( WS-DL ) courses will be offered in Spring 2015: CS 418 " Web Programming ", MW 3-4:15pm (CRN 24656 ), will be offered by Mat Kelly .  This will be an updated version of Dr. Weigle's class from last spring .  There will not be a 518 version of this class.  CS 495/595 " Big Data ", W 4:20-7pm (CRNs 29955 & 29956 ), will be offered by Dr. Charles "Chuck" Cartledge , a summer 2014 PhD graduate .  Chuck will adapt this class from Shahram Mohrehkesh 's class from spring 2014 . CS 725/825 " Information Visualization ", T 9:30am-12:15pm (CRNs 27990 & 27991 ), will be offered by Dr. Weigle .  She has most recently taught this class in fall 2013 .  CS 751/851 " Introduction to Digital Libraries ", R 4:20-7:00pm (CRNs 28839 & 28840 ), will be offered by Dr. Nelson .  This class will undergo many significant updates from its most recent off

2014-10-27: 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent

Image
Herbert and I attended the " 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent " at the Georgetown Law Library on Friday, October 24, 2014.  Although the origins for this workshop are many, catalysts for it probably include the recent Liebler  & Liebert study about link rot in Supreme Court opinions ,  and the paper by Zittrain, Albert, and Lessig about Perma.cc and the problem of link rot in the scholarly and legal record and the resulting popular media coverage resulting from it  (e.g., NPR and the NYT ).  The speakers were naturally drawn from the legal community at large, but some notable exceptions included David Walls from the GPO , Jefferson Bailey from the Internet Archive, and Herbert Van de Sompel from LANL. The event was streamed and recorded, and videos + slides will be available from the Georgetown site soon so I will only hit the highlights below.  After a welcome from Michelle Wu, the director of the Georgetown Law L