2016-08-30: Memento at the W3C
We are pleased to report that the W3C has embraced Memento for versioning its specifications and its wiki. Completing this effort required collaboration between the W3C and the Los Alamos National Laboratory (LANL) Research Library Prototyping Team. Here we inform others of the brief history of this effort and provide an overview of the technical aspects of the work done to make Memento at the W3C.
Brief History of Memento Work with the W3C
The W3C uses Memento for two separate systems:
Memento was implemented on both of these systems in 2016, but there were a lot of discussions and changes in direction along the way.
In 2010, Herbert Van de Sompel presented Memento as part of the Linked Data on the Web Workshop (LDOW) at WWW. The presentation was met with much enthusiasm. In fact, Sir Tim Berners-Lee stated "this is neat and there is a real need for it". Later, he met with Herbert to suggest that Memento could be used on the W3C site itself, specifically for time-based access to W3C specifications.
That same year, Harihar Shankar had finished the first working version of the Memento MediaWiki Extension. Ted Guild of the W3C installed this extension on their wiki for easy access to prior versions of pages.
At the time, the W3C kept their specifications in CVS. LANL and the W3C began discussions about how to use Memento with their CVS system and other associated web server software. This attempt ran into problems due to permissions issues and other concerns.
Fast forward to 2013, when Shawn Jones had joined the ODU Web Science and Digital Libraries Research Group. At this point, attempts to get the Memento MediaWiki Extension installed at Wikipedia had stalled. The extension had also ceased working with the version of MediaWiki then being used at the W3C. Shawn updated the extension, analyzing different design options, and evaluating their performance. He enlisted support from the MediaWiki development team in hopes that it would be acceptable for deployment at Wikipedia. Version 2.0.0 was released in 2014.
By 2014 Yorick Chollet had joined the LANL Prototyping Team. As part of work with the W3C, Yorick produced standalone TimeGate software that could be installed and run by anyone. The W3C had also started work on a web API for their specifications. The decision was made by both groups to develop the TimeGate as a microservice that would provide a Memento interface to the W3C API.
In 2015, Herbert notified the W3C that the latest version of the Memento MediaWiki Extension was available. After some planned updates to the W3C infrastructure, the updated extension was installed in January of 2016, restoring Memento support on their wiki.
The @w3c wiki now supports #memento via MediaWiki extension https://t.co/wuW8JIFQtA Will @Wikipedia be next? /cc @ReaderMeter @g_gerg
— Herbert (@hvdsomp) January 11, 2016
By that time the W3C specifications API was nearing completion. Harish and Herbert collaborated with José Kahan at the W3C to ensure that the W3C TimeGate microservice worked with the API. Once testing was complete, the W3C added the
Memento-Datetime
header and updated Link
headers to their resources in order to reference the new TimeGate. At the same time the W3C moved services to HTTPS, requiring HTTPS to be implemented at the TimeGate as well. Now both the W3C specifications and the W3C wiki use Memento.
Details of Memento Support for W3C Specifications
Work on Memento for the W3C Specifications entailed coordination between three components:
- Software local to the W3C Apache Web Server that serves these specifications - maintained at the W3C
- The Memento TimeGate microservice - maintained at LANL
- The W3C Specification API itself - maintained at the W3C
The diagram below provides an overview of the architecture of the Memento TimeGate microservice. The TimeGate accepts the
To demonstrate how these components work together, we will walk through Memento datetime negotiation using the specification for HTML 5 at URI-R
Accept-Datetime
header from Memento clients via HTTP. It then queries the W3C API using an API Handler. The result of that query is then used to discover the best revision of a specification that was active at the datetime expressed in the Accept-Datetime
Header.To demonstrate how these components work together, we will walk through Memento datetime negotiation using the specification for HTML 5 at URI-R
https://www.w3.org/TR/html5/
and an Accept-Datetime
value of Sat, 24 Apr 2010 15:00:00 GMT
.
As shown in the curl request below, the W3C Apache Web server produces the appropriate TimeGate
Link
header for original resources. Memento clients use the timegate
relation in this Link
header to discover the URI-G of the TimeGate for this resource.
# curl -I "https://www.w3.org/TR/html5/"
HTTP/1.1 200 OK
Date: Fri, 05 Aug 2016 20:41:42 GMT
Last-Modified: Fri, 24 Oct 2014 16:15:24 GMT
ETag: "20acd-5062d7cffff00"
Accept-Ranges: bytes
Content-Length: 133837
Cache-Control: max-age=31536000
Expires: Sat, 05 Aug 2017 20:41:42 GMT
P3P: policyref="http://www.w3.org/2014/08/p3p.xml"
Link: <https://timetravel.mementoweb.org/w3c/timegate/https://www.w3.org/ TR/html5/>;rel="timegate"
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
Strict-Transport-Security: max-age=15552000; includeSubdomains; preload
Content-Security-Policy: upgrade-insecure-requests
To continue datetime negotiation, a Memento client would then issue an HTTP request like the one below to this TimeGate - maintained by LANL.
HEAD /w3c/timegate/http://www.w3.org/TR/html5/ HTTP/1.1
Host: timetravel.mementoweb.org
Accept-Datetime: Sat, 24 Apr 2010 15:00:00 GMT
Connection: close
The Memento TimeGate microservice extracts the shortname from the original URI,
html5
in this case. It then queries the W3C API for this shortname directly, receiving a JSON response like the abridged one below. This response contains a version history the specification.
... ABRIDGED FOR BREVITY - SALIENT PARTS BELOW ...
"_embedded": {
"version-history": [
{
"status": "Recommendation",
"uri": "http:\/\/www.w3.org\/TR\/2014\/REC-html5-20141028\/",
"date": "2014-10-28",
"informative": false,
"title": "HTML5",
"shortlink": "http:\/\/www.w3.org\/TR\/html5\/",
"editor-draft": "http:\/\/www.w3.org\/html\/wg\/drafts\/html\/master\/",
"process-rules": "http:\/\/www.w3.org\/2005\/10\/Process-20051014\/",
"_links": {
"self": {
"href": "https:\/\/api.w3.org\/specifications\/html5\/versions\/20141028"
},
"editors": {
"href": "https:\/\/api.w3.org\/specifications\/html5\/versions\/20141028\/editors"
},
"deliverers": {
"href": "https:\/\/api.w3.org\/specifications\/html5\/versions\/20141028\/deliverers"
},
"specification": {
"href": "https:\/\/api.w3.org\/specifications\/html5"
},
"predecessor-version": {
"href": "https:\/\/api.w3.org\/specifications\/html5\/versions\/20141028\/predecessors"
}
}
},
... MULTIPLE OTHER VERSIONS FOLLOW - ABRIDGED FOR BREVITY ...
From this JSON response, the TimeGate looks for the
version-history
array inside the _embedded
object. From each entry in that array, it then extracts the uri
and date
. It then compares the value of the HTTP request's Accept-Datetime
header with the URIs and dates from this version history to find the URI-M of the best memento that was active at the Accept-Datetime
value.
In the case of our example, the datetime requested is
Sat, 24 Apr 2010 15:00:00 GMT
. Using the version history from the W3C API, the TimeGate discovers that the URI-M of best memento that was active at the Accept-Datetime
value is at http://www.w3.org/TR/2010/WD-html5-20100304/
. This URI-M is then used as the value of the Location
header of the TimeGate's response. Because the TimeGate has access to the entire version history, it easily generates additional Link
relations in its response, filling in the first
and last
relations in addition to the URI of the timemap
. The TimeGate's full response is shown below, with the Location
and Link
headers in bold.
# curl -I -H 'Accept-Datetime: Sat, 24 Apr 2010 15:00:00 GMT' 'https://timetravel.mementoweb.org/w3c/timegate/https://www.w3.org/TR/html5/'
HTTP/1.1 302 Found
Server: nginx/1.8.0
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Date: Fri, 05 Aug 2016 21:18:29 GMT
Vary: accept-datetime
Location: http://www.w3.org/TR/2010/WD-html5-20100304/
Link: <http://www.w3.org/TR/html5/>; rel="original", <https://timetravel.mementoweb.org/w3c/timemap/link/http://www.w3.org/TR/html5/>; rel="timemap"; type="application/link-format", <https://timetravel.mementoweb.org/w3c/timemap/json/http://www.w3.org/TR/html5/>; rel="timemap"; type="application/json", <http://www.w3.org/TR/2008/WD-html5-20080122/>; rel="first memento"; datetime="Tue, 22 Jan 2008 00:00:00 GMT", <http://www.w3.org/TR/2010/WD-html5-20100304/>; rel="memento"; datetime="Thu, 04 Mar 2010 00:00:00 GMT", <http://www.w3.org/TR/2014/REC-html5-20141028/>; rel="last memento"; datetime="Tue, 28 Oct 2014 00:00:00 GMT"
A Memento client would then interpret the HTTP
302
status code as a redirect and make a subsequent request to the URI-M from the Location
header. In the response, the W3C Apache Web server provides the Memento-Datetime
header, identifying this resource as a memento. Also provided are the timegate
and original
relations in the Link
header, so further datetime negotiation can occur if necessary.
# curl -I "http://www.w3.org/TR/2010/WD-html5-20100304/"
HTTP/1.1 200 OK
Date: Fri, 05 Aug 2016 21:19:07 GMT
Last-Modified: Tue, 08 Feb 2011 20:10:44 GMT
Memento-Datetime: Tue, 08 Feb 2011 20:10:44 GMT
ETag: "1d74a-49bcaf17c5900"
Accept-Ranges: bytes
Content-Length: 120650
Cache-Control: max-age=31536000
Expires: Sat, 12 Aug 2017 14:31:18 GMT
P3P: policyref="http://www.w3.org/2014/08/p3p.xml"
Link: <https://timetravel.mementoweb.org/w3c/timegate/http://www.w3.org/ TR/html5/>;rel="timegate", <http://www.w3.org/TR/html5/>;rel="original"
Vary: upgrade-insecure-requests
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
From this example example, we see that datetime negotiation is now possible for W3C specifications, allowing users to find prior versions of any W3C specification using a given datetime. As seen in the datetime negotiation example above and in the link relations diagram below, the relations in the link header make this possible, even though LANL maintains the TimeGate and the W3C maintains the original resource (current version of specification) and the mementos (past versions of the specification).
And, of course, TimeMaps work as well, with a TimeMap microservice using the W3C API to find the version history of the page. An example TimeMap is shown below.
# curl 'https://timetravel.mementoweb.org/w3c/timemap/link/https://www.w3.org/TR/html5/'
<https://www.w3.org/TR/html5/>; rel="original",
<https://timetravel.mementoweb.org/w3c/timegate/https://www.w3.org/TR/html5/>; rel="timegate",
<https://timetravel.mementoweb.org/w3c/timemap/link/https://www.w3.org/TR/html5/>; rel="self"; type="application/link-format",
<https://timetravel.mementoweb.org/w3c/timemap/json/https://www.w3.org/TR/html5/>; rel="timemap"; type="application/json",
<http://www.w3.org/TR/2008/WD-html5-20080122/>; rel="first memento"; datetime="Tue, 22 Jan 2008 00:00:00 GMT",
<http://www.w3.org/TR/2008/WD-html5-20080610/>; rel="memento"; datetime="Tue, 10 Jun 2008 00:00:00 GMT",
<http://www.w3.org/TR/2009/WD-html5-20090212/>; rel="memento"; datetime="Thu, 12 Feb 2009 00:00:00 GMT",
<http://www.w3.org/TR/2009/WD-html5-20090423/>; rel="memento"; datetime="Thu, 23 Apr 2009 00:00:00 GMT",
<http://www.w3.org/TR/2009/WD-html5-20090825/>; rel="memento"; datetime="Tue, 25 Aug 2009 00:00:00 GMT",
<http://www.w3.org/TR/2010/WD-html5-20100304/>; rel="memento"; datetime="Thu, 04 Mar 2010 00:00:00 GMT",
<http://www.w3.org/TR/2010/WD-html5-20100624/>; rel="memento"; datetime="Thu, 24 Jun 2010 00:00:00 GMT",
<http://www.w3.org/TR/2010/WD-html5-20101019/>; rel="memento"; datetime="Tue, 19 Oct 2010 00:00:00 GMT",
<http://www.w3.org/TR/2011/WD-html5-20110113/>; rel="memento"; datetime="Thu, 13 Jan 2011 00:00:00 GMT",
<http://www.w3.org/TR/2011/WD-html5-20110405/>; rel="memento"; datetime="Tue, 05 Apr 2011 00:00:00 GMT",
<http://www.w3.org/TR/2011/WD-html5-20110525/>; rel="memento"; datetime="Wed, 25 May 2011 00:00:00 GMT",
<http://www.w3.org/TR/2012/WD-html5-20120329/>; rel="memento"; datetime="Thu, 29 Mar 2012 00:00:00 GMT",
<http://www.w3.org/TR/2012/WD-html5-20121025/>; rel="memento"; datetime="Thu, 25 Oct 2012 00:00:00 GMT",
<http://www.w3.org/TR/2012/CR-html5-20121217/>; rel="memento"; datetime="Mon, 17 Dec 2012 00:00:00 GMT",
<http://www.w3.org/TR/2014/CR-html5-20140429/>; rel="memento"; datetime="Tue, 29 Apr 2014 00:00:00 GMT",
<http://www.w3.org/TR/2014/WD-html5-20140617/>; rel="memento"; datetime="Tue, 17 Jun 2014 00:00:00 GMT",
<http://www.w3.org/TR/2014/CR-html5-20140731/>; rel="memento"; datetime="Thu, 31 Jul 2014 00:00:00 GMT",
<http://www.w3.org/TR/2014/PR-html5-20140916/>; rel="memento"; datetime="Tue, 16 Sep 2014 00:00:00 GMT",
<http://www.w3.org/TR/2014/REC-html5-20141028/>; rel="last memento"; datetime="Tue, 28 Oct 2014 00:00:00 GMT"
Contrast this TimeMap of 19 versions with the 1,243 observations made by the Internet Archive for the same page. If studying the evolution of a standard, 19 explicit versions are easier to work with than more than 1000 observations, many of which are for the same version.
Details of Memento Support on the W3C Wiki
The W3C is also running the full Memento MediaWiki Extension on their wiki. The full Memento MediaWiki Extension provides TimeGates and TimeMaps as well as other additional information in the
Link
headers of its HTTP responses. Shown below is an example HTTP response for the original resource https://www.w3.org/wiki/HTML/Elements/link
.
# curl -I "https://www.w3.org/wiki/HTML/Elements/link"
HTTP/1.1 200 OK
X-Powered-By: PHP/5.4.45-0+deb7u4
X-Content-Type-Options: nosniff
Link: <https://www.w3.org/wiki/HTML/Elements/link>; rel="original latest-version",<https://www.w3.org/wiki/Special:TimeGate/HTML/Elements/link>; rel="timegate",<https://www.w3.org/wiki/Special:TimeMap/HTML/Elements/link>; rel="timemap"; type="application/link-format"; from="Mon, 14 Mar 2011 19:25:12 GMT"; until="Thu, 21 Jul 2011 22:24:53 GMT",<https://www.w3.org/wiki/index.php?title=HTML/Elements/link&oldid=48683>; rel="first memento"; datetime="Mon, 14 Mar 2011 19:25:12 GMT",<https://www.w3.org/wiki/index.php?title=HTML/Elements/link&oldid=52749>; rel="last memento"; datetime="Thu, 21 Jul 2011 22:24:53 GMT"
Content-language: en
Vary: Accept-Encoding,Cookie
Cache-Control: s-maxage=18000, must-revalidate, max-age=0
Last-Modified: Wed, 03 Aug 2016 04:40:32 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 24053
Accept-Ranges: bytes
Date: Wed, 03 Aug 2016 19:27:11 GMT
X-Varnish: 877421307 877181026
Age: 35199
Via: 1.1 varnish
X-Cache: HIT
Strict-Transport-Security: max-age=15552000; includeSubdomains; preload
Content-Security-Policy: upgrade-insecure-requests
Content-Security-Policy-Report-Only: default-src *.w3.org; img-src *.w3.org data:; style-src *.w3.org 'unsafe-inline'; script-src *.w3.org 'unsafe-inline'; frame-ancestors *.w3.org; report-uri https://www.w3.org/csp-report/29ce9kZ/wro
And also for prior versions of the same resource, we see that the
Memento-Datetime
and Link
headers are returned.
# curl -I "https://www.w3.org/wiki/index.php?title=HTML/Elements/link&oldid=52749"
HTTP/1.1 200 OK
X-Powered-By: PHP/5.4.45-0+deb7u4
X-Content-Type-Options: nosniff
Memento-Datetime: Thu, 21 Jul 2011 22:24:53 GMT
Link: <https://www.w3.org/wiki/HTML/Elements/link>; rel="original latest-version",<https://www.w3.org/wiki/Special:TimeGate/HTML/Elements/link>; rel="timegate",<https://www.w3.org/wiki/Special:TimeMap/HTML/Elements/link>; rel="timemap"; type="application/link-format"; from="Mon, 14 Mar 2011 19:25:12 GMT"; until="Thu, 21 Jul 2011 22:24:53 GMT",<https://www.w3.org/wiki/index.php?title=HTML/Elements/link&oldid=48683>; rel="first memento"; datetime="Mon, 14 Mar 2011 19:25:12 GMT",<https://www.w3.org/wiki/index.php?title=HTML/Elements/link&oldid=52749>; rel="last memento"; datetime="Thu, 21 Jul 2011 22:24:53 GMT"
Content-language: en
Vary: Accept-Encoding,Cookie
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0
Content-Type: text/html; charset=UTF-8
Content-Length: 24966
Accept-Ranges: bytes
Date: Sat, 06 Aug 2016 19:12:58 GMT
X-Varnish: 878886405
Age: 0
Via: 1.1 varnish
X-Cache: MISS
Strict-Transport-Security: max-age=15552000; includeSubdomains; preload
Content-Security-Policy: upgrade-insecure-requests
Content-Security-Policy-Report-Only: default-src *.w3.org; img-src *.w3.org data:; style-src *.w3.org 'unsafe-inline'; script-src *.w3.org 'unsafe-inline'; frame-ancestors *.w3.org; report-uri https://www.w3.org/csp-report/29ce9kZ/wro
The Memento MediaWiki Extension has already been mentioned in several posts in this blog: discussing its performance, highlighting MediaWiki's issues with temporal coherence, and how many Wikipedia and Wikia users could benefit from its installation. Its design and performance has also been documented in a lengthy technical report. It was demonstrated at WikiConference USA 2014 and as part of the Masters Thesis Presentation of Shawn M. Jones.
For more information on the extension, we suggest consulting those resources, as well as its GitHub and MediaWiki sites.
Conclusions
Since its inception, we have identified many use cases for Memento, from reconstructing web pages from many existing archives to avoiding spoilers in fiction to managing the temporal nature of semantic web data. We are happy that the W3C has adopted Memento for use in their work as well.
Even though the W3C maintains the Apache server holding mementos and original resources, and LANL maintains the systems running the W3C TimeGate software, it is the relations within the
Link
headers that tie everything together. It is an excellent example of the harmony possible with meaningful Link
headers. Memento allows users to negotiate in time with a single web standard, making web archives, semantic web resources, and now W3C specifications all accessible the same way. Memento provides a standard alternative to a series of implementation-specific approaches.
We have been trying to bring Memento support to Wikipedia for the past few years, demonstrating the technology at conferences, working with their development team, and even getting direct feedback on the software from MediaWiki developers such as LegoTKM, Jeroen De Dauw, and ricordisamoa. Unfortunately, we have so far been unsuccessful with discussing deployment to Wikipedia. Perhaps they can be our next major adopter?
--
Herbert Van de Sompel
- and -
Harihar Shankar
- and -
Comments
Post a Comment