Monday, October 24, 2016

2016-10-23: Institutional Repositories, OAI-PMH, and Anonymous FTP

Richard Poynder's recent blog post "Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?" has generated a lot of discussion, including a second post from Richard to address the comments and the always insightful commentary from David Rosenthal ("Why Did Institutional Repositories Fail?").  There surely have been enough articles about institutional repositories to fill an institutional repository, but of particular interest to me are discussions about the technical and aspirational goals of OAI-PMH.

A year ago Herbert and I reflected on OAI-PMH and other projects ("Reminiscing About 15 Years of Interoperability Efforts"), which I wish Richard would have referenced in his discussion (although Cliff does allude to this in his interview (MLN edit: Richard points out that I missed his quoting of that paper in his second blog post), as well as the original SFC and UPS papers.  For his response to Richard, Herbert had a series of tweets which I collected:




I also put forward my own perspective in a series of tweets, which I will summarize below.  To me, OAI-PMH is the logical conclusion of the trajectory of the computer science department tradition of publishing technical reports on anonymous FTP servers.  These were both pre- and post-print versions, and whereas arXiv.org was based on a centralized approach (due in part to its SMTP origins), the anonymous FTP approach was inherently distributed, and was a departmental-level institutional repository.  

Within the CS community, the CS-TR project (which produced Dienst) and WATERS project evolved into NCSTRL, which was arguably one of the first open source institutional repository systems.  An unrelated effort that is often overlooked was the Unified Computer Science Technical Report Index (UCSTRI), whose real innovation was that it provided a centralized interface to the distributed anonymous FTP servers without requiring them to do anything.  It would cleverly crawl and index known FTP servers, parse the README files, and construct URLs from the semi-structured metadata.  The parsing results weren't always perfect, but for 1993 it was highly magical and presaged the idea of building centralized services on top of existing, uncoordinated servers.

At NASA Langley Research Center in 1993, I brought the anonymous FTP culture to NASA technical reports (mostly their own report series, but some post-prints, see NASA TM-4567), followed by a web interface in 1993 (NASA TM-109162).  In 1994, we integrated several of these web interfaces into the NASA Technical Report Server (NTRS, AIAA-95-0964), which continues in name to this day (ntrs.nasa.gov) as an institutional repository that largely goes unrecognized as such (albeit covering a smaller range of subjects than a typical university). NTRS is a centralized operation today, but it was originally a distributed search model.  Due in part to the limited number of NASA Centers, projects, and affiliated institutes (there were probably never more than a dozen in NTRS) it was initially a distributed architecture.

By 1999 there was a proliferation of both subject-based and institutional repositories, which lead to the UPS experiment and ultimately OAI-PMH itself.  The proliferation of the web made it possible to greatly enhance the functionality of the anonymous FTP server (searching, better browsing, etc.).  But at the same time the web also killed the CS departmental technical report series and the servers that hosted them.  Although some may exist somewhere, off the top of my head I'm not aware of any CS departments with an active CS technical report series, at least not like the 80s and 90s.

The web made it possible for individuals to list their pre- and post-prints on their own page (e.g., my publication page, Herbert's publication page), and systems like CiteSeer, Google Scholar, and others -- much like UCSTRI before them -- evolved to discover these e-prints linked from individuals' home pages and centrally index them with no administrative or author effort.

In summary, I believe any discussion of institutional repositories (and OAI-PMH) has to acknowledge that while the web allowed for their evolution of repository systems to their current advanced state, the web also obsoleted many of the models and assumptions that drove the development of repository systems in the first place.  The web allowed for "fancy" anonymous FTP servers, but it also meant that we no longer needed them.  Or perhaps we need them differently and a lot less: institutional repositories still have a functional role, but they need to be operated more like Google Scholar et al.

--Michael

No comments:

Post a Comment