Richard Poynder's recent blog post "Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?" has generated a lot of discussion, including a second post from Richard to address the comments and the always insightful commentary from David Rosenthal ("Why Did Institutional Repositories Fail?"). There surely have been enough articles about institutional repositories to fill an institutional repository, but of particular interest to me are discussions about the technical and aspirational goals of OAI-PMH.
A year ago Herbert and I reflected on OAI-PMH and other projects ("Reminiscing About 15 Years of Interoperability Efforts"),
which I wish Richard would have referenced in his discussion (although
Cliff does allude to this in his interview (MLN edit: Richard points out that I missed his quoting of that paper in his second blog post), as well as the original SFC and UPS papers. For his response to Richard, Herbert had a series of tweets which I collected:
Herbert Van de Sompel's Reaction to Richard Poynder's "Q&A with..."
I also put forward my own perspective in a series of tweets,
which I will summarize below. To me, OAI-PMH is the logical conclusion
of the trajectory of the computer science department tradition of
publishing technical reports on anonymous FTP servers. These were both
pre- and post-print versions, and whereas arXiv.org was based on a centralized approach (due in part to its SMTP origins), the anonymous FTP approach was inherently distributed, and was a departmental-level institutional repository.
Within the CS community, the CS-TR project (which produced Dienst) and WATERS project evolved into NCSTRL,
which was arguably one of the first open source institutional
repository systems. An unrelated effort that is often overlooked was
the Unified Computer Science Technical Report Index (UCSTRI), whose real innovation was that it provided a centralized interface to the distributed anonymous FTP servers without requiring them to do anything.
It would cleverly crawl and index known FTP servers, parse the README
files, and construct URLs from the semi-structured metadata. The
parsing results weren't always perfect, but for 1993 it was highly
magical and presaged the idea of building centralized services on top of
existing, uncoordinated servers.
At NASA Langley
Research Center in 1993, I brought the anonymous FTP culture to NASA
technical reports (mostly their own report series, but some post-prints,
see NASA TM-4567), followed by a web interface in 1993 (NASA TM-109162). In 1994, we integrated several of these web interfaces into the NASA Technical Report Server (NTRS, AIAA-95-0964), which continues in name to this day (ntrs.nasa.gov)
as an institutional repository that largely goes unrecognized as such
(albeit covering a smaller range of subjects than a typical university).
NTRS is a centralized operation today, but it was originally a
distributed search model. Due in part to the limited number of NASA
Centers, projects, and affiliated institutes (there were probably never
more than a dozen in NTRS) it was initially a distributed architecture.
By 1999 there was a proliferation of both subject-based and institutional repositories, which lead to the UPS experiment
and ultimately OAI-PMH itself. The proliferation of the web made it
possible to greatly enhance the functionality of the anonymous FTP
server (searching, better browsing, etc.). But at the same time the web
also killed the CS departmental technical report series and the servers
that hosted them. Although some may exist somewhere, off the top of my
head I'm not aware of any CS departments with an active CS technical
report series, at least not like the 80s and 90s.
The web made it possible for individuals to list their pre- and post-prints on their own page (e.g., my publication page, Herbert's publication page), and systems like CiteSeer, Google Scholar,
and others -- much like UCSTRI before them -- evolved to discover these
e-prints linked from individuals' home pages and centrally index them
with no administrative or author effort.
In summary, I
believe any discussion of institutional repositories (and OAI-PMH) has
to acknowledge that while the web allowed for their evolution of
repository systems to their current advanced state, the web also obsoleted many of the models and assumptions that drove the development of repository systems in the first place.
The web allowed for "fancy" anonymous FTP servers, but it also meant
that we no longer needed them. Or perhaps we need them differently and a
lot less: institutional repositories still have a functional role, but
they need to be operated more like Google Scholar et al.