2014-12-20: Using Search Engine Queries For Reliable Links

Earlier this week Herbert brought to my attention Jon Udell's blog post about combating link rot by crafting search engine queries to "refind" content that periodically changes URIs as the hosting content management system (CMS) changes.

Jon has a series of columns for InfoWorld, and whenever InfoWorld changes their CMS the old links break and Jon has to manually refind all the new links and update his page.  For example, the old URI:

is currently:

The same content had at least one other URI as well, from at least 2009--2012:

The first reaction is to say InfoWorld should use "Cool URIs", mod_rewrite, or even handles.  In fairness, Inforworld is still redirecting the second URI to the current URI:

curl -I http:…

2014-11-20: Archive-It Partners Meeting 2014

I attended the 2014 Archive-It Partners Meeting in Montgomery, AL on November 18.  The meeting attendees are representatives from Archive-It partners with interests ranging from archiving webpages about art and music to archiving government webpages.  (Presentation slides are now available on the Archive-It wiki.)  This is ODU's third consecutive Partners Meeting (see trip reports from 2012 and 2013).

The morning program was focused on presentations from partners who are building collections.  Here's a brief overview of each of those.

Penny Baker and Susan Roeper from the Clark Art Institute talked about their experience in archiving the 2013 Venice Biennale international art exhibition (Archive-It collection) and plans for the upcoming exhibition.  Their collection includes exhibition catalogs, monographs, and press releases about the event.  The material also includes a number of videos (mainly from vimeo), which Archive-It can now capture.

Beth Downs from the Montana State L…

2014-11-14: Carbon Dating the Web, version 2.0

For over 1 year, Hany SalahEldeen's Carbon Date service has been out of service mainly because of API changes in some of the underlying modules on which the service is built upon. Consequently, I have taken up the responsibility of maintaining the service, beginning with the following now available in Carbon Date v2.0. Carbon Date v2.0
The Carbon Date service currently makes requests to the different modules (Archives, backlinks, etc.), in a concurrent manner through threading. The server framework has been changed from bottle server to CherryPy server which is still a python minimalist WSGI server, but a more robust framework which features a threaded server. How to use the Carbon Date service There are three ways: Through the website, Given that carbon dating is highly computationally intensive, the site should be used just for small tests as a courtesy to other users. If you have the need to Carbon Date a large number of URLs, you should install the applic…

2014-11-09: Four WS-DL Classes for Spring 2015

We're excited to announce that four Web Science & Digital Library (WS-DL) courses will be offered in Spring 2015: CS 418 "Web Programming", MW 3-4:15pm (CRN 24656), will be offered by Mat Kelly.  This will be an updated version of Dr. Weigle's class from last spring.  There will not be a 518 version of this class. CS 495/595 "Big Data", W 4:20-7pm (CRNs 29955 & 29956), will be offered by Dr. Charles "Chuck" Cartledge, a summer 2014 PhD graduate.  Chuck will adapt this class from Shahram Mohrehkesh's class from spring 2014.CS 725/825 "Information Visualization", T 9:30am-12:15pm (CRNs 27990 & 27991), will be offered by Dr. Weigle.  She has most recently taught this class in fall 2013.  CS 751/851 "Introduction to Digital Libraries", R 4:20-7:00pm (CRNs 28839 & 28840), will be offered by Dr. Nelson.  This class will undergo many significant updates from its most recent offering in spring 2011.  Web Programmin…

2014-10-27: 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent

Herbert and I attended the "404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent" at the Georgetown Law Library on Friday, October 24, 2014.  Although the origins for this workshop are many, catalysts for it probably include the recent Liebler  & Liebert study about link rot in Supreme Court opinions,  and the paper by Zittrain, Albert, and Lessig about and the problem of link rot in the scholarly and legal record and the resulting popular media coverage resulting from it  (e.g., NPR and the NYT). 

The speakers were naturally drawn from the legal community at large, but some notable exceptions included David Walls from the GPO, Jefferson Bailey from the Internet Archive, and Herbert Van de Sompel from LANL. The event was streamed and recorded, and videos + slides will be available from the Georgetown site soon so I will only hit the highlights below. 

After a welcome from Michelle Wu, the director of the Georgetown Law Library, the w…

2014-10-16: Grace Hopper Celebration of Women in Computing (GHC) 2014

I was thrilled and humbled for the second time to attend Grace Hopper Celebration of women in computing (GHC) 2014, the world’s largest gathering for technologists women. GHC is presented by the Anita Borg Institute for Women and Technology, which was founded by Dr. Anita Borg and Dr. Telle Whitney in 1994 to bring together research and career interests of women in computing and encourage the participation of women in computing. The twentieth anniversary of GHC was held in Phoenix, Arizona on October 8-10, 2014. This year, GHC has almost doubled the number of women who have research and business interests from the last year to be 8,000 women from about 67 countries and about 900 organizations to get inspired, gain expertise, get connected, and have fun.

Aida Ghazizadeh from the Department of Computer Science at Old Dominion University also was awarded travel scholarships to attend this year's GHC. I hope ODU will have more participation in the upcoming years.

The conference them…

2014-10-07: FluNet Visualization

(Note: This wraps up the current series of posts about visualizations created either by students in our research group or in our classes. I'll post more after the Spring 2015 offering of the course.)

I've been teaching the graduate Information Visualization course since Fall 2011.  In this series of posts, I'm highlighting a few of the projects from each course offering.  (Previous posts: Fall 2011, Fall 2012, 2013)

The final visualization in this series is an interactive visualization of the World Health Organization's global influenza data, created by Ayush Khandelwal and Reid Rankin in the Fall 2013 InfoVis course. The visualization is currently available at and is best viewed in Chrome.

The Global Influenza Surveillance and Response System (GISRS) has been in operation since 1995 and aggregates data weekly from laboratories and flu centers around the world. The FluNet website was constructed to provide access to this data, …

2014-10-03: Integrating the Live and Archived Web Viewing Experience with Mink

UPDATE: Download the latest version of Mink here. The goal of the Memento project is to provide a tighter integration between the past and current web.    There are a number of clients now that provide this functionality, but they remain silent about the archived page until the user remembers to invoke them (e.g., by right-clicking on a link).

We have created another approach based on persistently reminding the user just how well archived (or not) are the pages they visit.  The Chrome extension Mink (short for Minkowski Space) queries all the public web archives (via the Memento aggregator) in the background and will display the number of mementos (that is, the number of captures of the web page) available at the bottom right of the page.  Selecting the indicator allows quick access to the mementos through a dropdown.  Once in the archives, returning to the live web is as simple as clicking the "Back to Live Web" button.

For the case where there are too many mementos to make…

2014-09-25: Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

The Internet Archive (IA) and Open Library offer over 6 million fully accessible public domain eBooks. I searched for the term "dictionary" while I was casually browsing the scanned book collection to see how many dictionaries they have. I found several dictionaries in various languages. I randomly picked A Dictionary of the English Language (1828) - Samuel Johnson, John Walker, Robert S. Jameson from the search result. I opened the dictionary in fullscreen mode using IA's opensource online BookReader application. This book reader application has common tools for browsing an image based book such as flipping pages, seeking a page, zooming, and changing the layout. In the toolbar it has some interesting features like reading aloud and full-text searching. I wondered how could it possibly perform text searching and read aloud an scanned raster image based book? I sneaked inside the page source code which pointed me to some documentation pages. I realized it is using an Op…