Tuesday, June 18, 2013

2013-06-18: NTRS, Memento, and Handles

In a previous post I covered the shut down of the NASA Technical Report Server, which has since come back online in a reduced capacity.  In this post we examine some of the peculiarities of the current state of NTRS, particularly with respect to Handles and Memento. 

Earlier this week I needed to access an old NASA report of mine, ironically enough about NTRS, from 1996:
Richard C. Tuey, Mary Collins, Pamela Caswell, Bob Haynes, Michael L. Nelson, Jeanne Holm, Lynn Buquo, Annette Tingle, Bill Cooper and Roy Stiltner, NASAwide Electronic Publishing System-Prototype STI Electronic Document Distribution: Stage-4 Evaluation Report, NASA TM-104630 (parts 1 and 2), May 1996.
It is not a particularly enjoyable report; it is the kind of lengthy, multi-authored, sanitized, bureaucratic-engineering report that people write but don't read (a "better" summary can be found in AIAA-95-0964).  I probably have a pdf of the report somewhere in my files, but instead I pulled up my publication list and clicked on the linked URI: http://hdl.handle.net/2060/19960028185, which resulted in a redirection to http://ntrs.nasa.gov/errors/PDF-removed.html and an HTTP "403 Forbidden" error:


The raw HTTP:





In short, NTRS is denying me access to an engineering report about NTRS -- as it existed nearly 20 years ago.  I created the link to the Handle (i.e., http://hdl.handle.net/2060/19960028185) for the report because that's the right thing to do (tm): handles are "cool URIs" and hide the "how we do it today", with the idea that the publisher registers with the Handle System the mapping of a particular Handle to its current URI.  When the publisher changes its content management system, gets bought by another publisher, etc. the Handle itself doesn't change even if the value it maps to changes.  The Handle System is what implements the more familiar Digital Object Identifier (DOI) system that most major publishers use; in short the set of all DOIs is a proper subset of all Handles. 

I've always been critical of popular coverage of science stories because they often fail to link to the DOIs (or Handles).  For example, in this randomly chosen story the author links to the final target URI:

http://iopscience.iop.org/0004-637X/770/2/148

when he "should" link to the DOI itself:

http://dx.doi.org/10.1088/0004-637X/770/2/148

In this case, you can lexigraphically map between the target URI and its DOI, but that's not always the case.  And truthfully, if iopscience.iop.org commits to the stability of the former URI, then regular users won't notice or care about the difference (only digital library wonks like myself). 

So you can imagine my disappointment when I clicked on http://hdl.handle.net/2060/19960028185 and discovered that NASA has mapped this -- and all of its Handles -- to a "403 Forbidden" page.  I could not access this report.  Searching my own personal archives is always the last resort, so I went to Google Scholar and found that they still had recorded the original target URI for the report:


It does not display in the image above, but the URI is:

http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028014_1996060715.pdf

That's an ugly URI, and not one that you'll discover using the NASA TM number, the title, or other semantic clues. Unfortunately, clicking on that URI produces another "access denied" page, different from the one you receive when clicking on the Handle:


The raw HTTP:



To add insult to injury, the above page is a "soft 404" -- the WWW equivalent of turning on your porch light for Halloween but not distributing candy.  Fortunately, I was using MementoFox so I simply activated my timeslider and was able to grab a copy of report from Archive-It at:

http://wayback.archive-it.org/all/20100518033903/http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028014_1996060715.pdf

For those who care about the details, here is the TimeMap for the ntrs.nasa.gov URI:



I was able to access the ntrs.nasa.gov URI because Google Scholar had maintained the mapping, but we can also query Memento servers for the TimeMap of the Handle as well and discover five more copies:



Unfortunately, the Internet Archive won't serve their versions because of the current NTRS robots.txt file is blocking access (see IA's policy on robots.txt). 




The fact that the TimeMaps are different for http://hdl.handle.net/2060/19960028185 and http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028185_1996060716.pdf is the subject of Ahmed AlSum's TempWeb 2013 paper; this is a surprisingly tough problem. 

In summary, NTRS erased the mapping from their Handles to the target URIs, which makes additional work when it comes to finding another copy in a public web archive.  It's not just my report (which is no big loss to "Science"), but other reports too; randomly replacing some digits in the Handle finds that it is archived in Archive-It as well:



I'm not sure how many of the unprocessed reports are available via Memento but until the time when NTRS is fully restored, the suite of Memento tools will help you out. 

--Michael