Thursday, July 7, 2016

2016-07-07: Signposting the Scholarly Web

The web site for "Signposting the Scholarly Web" recently went online.  There is a ton of great content available and since it takes some time to process it all, I'll give some of the highlights here.

First, this is the culmination of ideas that have been brewing for some time (see this early 2015 short video, although some of the ideas can arguably be traced to this 2014 presentation).  Most recently, our presentation at CNI Fall 2015, our 2015 D-Lib Magazine article, and our 2016 tech report advanced the concepts.

Here's the short version: the purpose is to make a standard, machine-readable method for web robots and other clients to "follow their nose" as they encounter scholarly material on the web.  Think of it as similar (in purpose if not technique) to Facebook's Open Graph or FOAF, but for publications, slides, data sets, etc. 

Currently there are three basic functions in Signposting:
  1. Discovering rich, structured, bibliographic metadata from web pages.  For example, if my user agent is at a landing page, publication page, PDF, etc., then Signposting allows me to discover where the BibTeX, MARC, DC, or whatever metadata format the publisher makes available.  Lots of DC records "point to" scholarly web pages, but this defines how the pages can "point back" to their metadata.
  2. Provide bi-directional linkage between a web page and its DOI.  OK, technically it doesn't have to be a DOI but that's the most common case.  One can dereference a DOI (e.g., http://dx.doi.org/10.1371/journal.pone.0115253) and be redirected to the URI at the publisher's site (in this case: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253).  But there isn't a standardized, machine-readable method for discovering the DOI from the landing page, PDF, data set, etc. at the publisher's site (note: rel="canonical" serves a different purpose).  The problem is few people actually link to DOIs, instead they link to the final (and not stable) URL.  For example, this popular news story about cholesterol research links to the article at the publisher's site, but not the DOI.  For this purpose, we introduce rel="identifier", which allows items in a scholarly object to point back to their DOI (or PURLs, handles, ARKs, etc.). 
  3. Delineating what's part of the scholarly object and what is not.  Some links are clearly intended to be "part" of the scholarly object: the PDF, the slides, the data set, the code, etc.  Some links are useful, but not part of the scholarly object: navigational links, citation services, bookmarking services, etc.  You can think of this as a greatly simplified version of OAI-ORE (and if you're not familiar with ORE, don't worry about it; it's powerful but complex).  Knowing what is part of the scholarly object will, among other things, allow us to assess how well it is has been indexed, archived, etc.
Again, there's a ton of material at the site, both in terms of modeling common patterns as well as proposed HTTP responses for different purposes.  But right now it all comes down to providing links for three simple things: 1) the metadata, 2) the DOI (or other favorite identifier), 3) items "in the object". 

Please take a look at the site, join the Signposting list and provide feedback there about the current three patterns, additional patterns, possible use cases, or anything else. 

--Michael

No comments:

Post a Comment