Monday, November 15, 2010

2010-11-15: Memento Presentation at UNC; Memento ID

I recently had a chance to return to the School of Information and Library Science, UNC Chapel Hill, where I had a most enjoyable post-doc during the academic year 2000-2001. Jane Greenberg was nice enough to invite me to speak about Memento in her INLS 520 "Organization of Information" class on Tuesday, November 9th as well as give an invited lecture about Memento to the UNC Scholarly Communications Working Group on Wednesday, November 10th.

When I first went to UNC I had the office next to Jane and she was just an assistant professor, now she's a full professor and director of the Metadata Research Center. I enjoyed catching up with her and my many other friends and colleagues at SILS.

My slides are available on; they are mostly a combination of slides I've posted before, but with some updates in the HTTP headers. Although the changes are very slight, the recently submitted (11/12/10) Memento Internet Draft takes precedence over all of our prior published papers and slides. For those who don't know, IETF Internet Drafts are the first step in the process of issuing an RFC (cf. "I'm Just a Bill...").


Friday, November 5, 2010

2010-11-05: Memento-Datetime is not Last-Modified

One of the key contributions of the Memento Framework is the HTTP response header "Memento-Datetime" (previously called "Content-Datetime" in our earlier publications & slides). Memento-Datetime is the sticky, intended datetime* for the representation returned when a URI is dereferenced. The presence of the Memento-Datetime HTTP response header is how the client realizes it has reached a Memento.

Rather than formally explain what we mean by "sticky, intended datetime", it is easier to explain how it is neither the value in the HTTP response header Last-Modified, nor is it the creation date of the resource (which has no corresponding HTTP header, for reasons that will become clear). For the examples below, we'll define the following abbreviations:
  • CD (Creation-Datetime) = the datetime the resource was created
  • MD (Memento-Datetime) = the datetime the representation was observed on the web
  • LM (Last-Modified) = the datetime the resource last changed state
Case 1: CD == MD == LM

We'll begin with a case in which all three datetime values could be the same. Consider the case of this index page at*/

The index page has a link to a single Memento. For simplicity, we'll assume created this index page and the Memento it references at the moment of the crawl, thus the various datetimes of the Memento would all be equal:

Creation-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Memento-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Last-Modified: Wed, 05 Mar 2008 20:16:49 GMT

Case 2: CD == MD < LM

If we click on the Memento (http://wayback.archive-it.or/927/20080305201649/, we see that it has a disclaimer banner ("You are viewing an archived web page...") that many archives employ to inform the reader that they are looking at a Memento and not the original resource. Although there are many techniques for inserting such a banner, the Archive-It example directly modifies the original HTML to insert this banner (as well as handle URI rewriting, etc.).

Now pretend the wording of the banner needs to be changed (for example, to address a new legal requirement). The CD and MD of the Memento are unchanged, but the LM must reflect when the wording of the banner changed:

Creation-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Memento-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Last-Modified: Fri, 05 Nov 2010 23:25:19 GMT

Both your lawyer and your HTTP cache consider this an important change, so you have to update LM. But it also clear that the essence of March 2008 observation of the Memento by is unchanged by the wording change of the archive banner, so MD is not updated. And certainly the CD is unchanged by this modification.

Case 3: MD < CD <= LM

Now pretend you are making a new web archive, and you are populating it by crawling other web archives such as (simulated with the king of browsers in the image to the left). You are effectively copying:


The presence of the Memento-Datetime header from indicates that the resource is an encapsulation of the state of another resource, at the MD datetime value. The link between the Memento and the original resource is indicated with an HTTP Link response header:

Link: rel="original"; <>

Thus, MD is sticky in that the new Memento at retains the MD value it observed from However, the CD and LM values reflect the datetime relative to

Creation-Datetime: Fri, 05 Nov 2010 23:25:19 GMT
Memento-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Last-Modified: Fri, 05 Nov 2010 23:25:19 GMT

The MD and LM datetimes can also vary for the Memento as described in Case 2. (In the unlikely case that the intent of was to create an archive of how resources were archived, the MD could be reset to 05 Nov 2010 and the Link header would point to the resource as the original resource instead of the resource; however, this is not the point of this discussion.)

Case 4: CD < MD <= LM

This scenario is probably less common, but you could imagine situations in which CD is the earliest datetime value. This might happen in situations in which the resource was created with something akin to fork() & exec() semantics: the resource was technically created at a certain datetime , but it did not acquire its own state until a later datetime, reflected in the MD & LM values.

For example, a transactional archive might record as CD the first datetime in which a resource returns a 200 response, but might choose to delay archiving Mementos until the resource's state is something other than "Welcome to Apache". In this scenario, you could have:

Creation-Datetime: Wed, 05 Mar 2008 20:16:49 GMT
Memento-Datetime: Fri, 05 Nov 2010 23:25:19 GMT
Last-Modified: Fri, 05 Nov 2010 23:25:19 GMT

The MD and LM datetimes could also vary as described in Case 2.

Creation Datetime Is Often Unavailable

To illustrate the differences between the various datetime concepts, the above examples have discussed Creation Datetime as if it is a commonly available value. However, this is most often not the case -- in fact, there is no defined HTTP response header that corresponds to Creation Datetime. This is due to the historical limitation of Unix inodes (i.e., metadata for files), which track three notions of time: atime (access time of the file), mtime (modification time of the file), and ctime (modification time of the inode). Modern content management systems might keep track of Creation Datetime, but it is not formally defined at the HTTP level.


The above examples should provide illustrations of how the three notions of datetime, although obviously related, have slightly different semantics. It should be clear that a Memento's Memento-Datetime is also not just Creation-Datetime or Last-Modified inherited from the original resource for which it is a Memento. Rather than overload an existing HTTP response header (such as Last-Modified), we have introduced the Memento-Datetime (nee Content-Datetime) response header. Additional information about Memento headers, Link rel types, and HTTP interactions can be found at

-- Michael

* Datetime = neologism of "date" & "time": the former is often understood to have a granularity of days, and the latter a granularity of seconds.