2013-07-09: Archive.is Supports Memento

(2014-04-16 edit: Two days ago, archive.is started 301 redirecting to archive.today. Otherwise, all the existing links should look and function as they had been.)

There's a lot to like about Archive.is, a recent entry in the page-at-a-time personal web archiving space: the simple search/upload interface, the bookmarklet for easily pushing pages into the archive while reading, the thumbnails (and full-sized images) of captured pages, how it handles Javascript, etc. But now there is an additional reason: Archive.is natively supports Memento and is now included in the Memento aggregators at LANL and ODU.

Archive.is is similar to WebCite in that it archives a single page when a user requests that it be archived. This is different from crawlers at, for example, the Internet Archive and Archive-It, which crawl the web all the time, archiving pages as they go along. These archives represent different, complementary strategies for crawling the web:

Archive.is, WebCite: single page, on demand archiving, won't crawl the entire site, but you control when and what gets archived
Internet Archive, Archive-It: repeatedly crawls entire sites, but harder to influence what is archived and when it is archived

The API and the interfaces of Archive.Is are simple and attractive. For example, a search for web sites with an HTML response is done like:

http://archive.is/{URI-R}

Where URI-R is an "original resource" in Memento terms. For example:

http://archive.is/http://www.cs.odu.edu/~mln/

Produces a listing of thumbnails for each memento (archived web page):

Clicking on the 2nd thumbnail produces the memento with a well-instrumented archival banner with the Memento-Datetime (i.e., the archival capture time), links to sharing utilities, backlinks, pre-built searches for related pages in Archive.is, etc.:

http://archive.is/20130621194047/http://www.cs.odu.edu/~mln/

Scrolling down in the same page shows another feature that Archive.Is does well: capturing the Javascript and keeping from reaching out to the live web (see Justin's post "Zombies in the Archives" for a good discussion of this problem). In my web page I have the Twitter widget that shows my last 20 tweets; the picture below shows that the widget does not reach out to the live web and grab current tweets -- the last tweet in this memento will forever be June 19, 2013.

And just in case there is question about rendering old pages in new browsers, Archive.is renders a 1024X768 png of the page at the time of capture (where http://archive.is/DpiQe/ is a shortened form of http://archive.is/20130621194047/http://www.cs.odu.edu/~mln/):

http://archive.is/DpiQe/image

The search interface works well for showing pages from the same site too, using "*" as a wild card. Here's screen shot for:

http://archive.is/http://www.cs.odu.edu/*

Notice how it suppresses some of the thumbnails for the scalability of the UI.

I mentioned above that Archive.is natively supports Memento. The service points for the TimeGate and TimeMap functionality are:

http://archive.is/timegate/{URI-R}
http://archive.is/timemap/{URI-R}

And since there can never be enough raw HTTP in this blog, here's a curl request to the TimeGate for http://www.bbc.co.uk/:

% curl -I -H "Accept-Datetime: Sun, 24 Mar 2013 20:52:23 GMT" http://archive.is/timegate/http://www.bbc.co.uk/

HTTP/1.1 302 Found
Vary: Accept-Datetime
Link: <http://www.bbc.co.uk/>; rel="original", <http://archive.is/timegate/http://www.bbc.co.uk/>; rel="timegate", <http://archive.is/timemap/http://www.bbc.co.uk/>; rel="timemap"; type="application/link-format"; from="Sat, 21 Dec 1996 10:29:38 GMT"; until="Mon, 1 Jul 2013 18:11:45 GMT", <http://archive.is/20130114004817/http://www.bbc.co.uk/>; rel="prev memento"; datetime="Mon, 14 Jan 2013 00:48:17 GMT", <http://archive.is/20130415031007/http://www.bbc.co.uk/>; rel="next memento"; datetime="Mon, 15 Apr 2013 03:10:07 GMT", <http://archive.is/19961221102938/http://www.bbc.co.uk/>; rel="first memento"; datetime="Sat, 21 Dec 1996 10:29:38 GMT", <http://archive.is/20130701181145/http://www.bbc.co.uk/>; rel="last memento"; datetime="Mon, 1 Jul 2013 18:11:45 GMT"
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: /20130324205223/http://www.bbc.co.uk/
Content-Length: 0
Accept-Ranges: bytes
Date: Tue, 09 Jul 2013 15:02:35 GMT
Connection: keep-alive
Server: nginx/1.2.4

And here's a curl request for a TimeMap for my home page:

And the Memento headers returned from a single memento:

% curl -I http://archive.is/20130621194047/http://www.cs.odu.edu/~mln/

HTTP/1.1 200 OK
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Link: <http://www.cs.odu.edu/~mln/>; rel="original", <http://archive.is/timegate/http://www.cs.odu.edu/~mln/>; rel="timegate", <http://archive.is/timemap/http://www.cs.odu.edu/~mln/>; rel="timemap"; type="application/link-format"; from="Tue, 18 Jun 2013 18:57:42 GMT"; until="Fri, 21 Jun 2013 19:40:47 GMT", <http://archive.is/20130618185742/http://www.cs.odu.edu/~mln/>; rel="prev memento"; datetime="Tue, 18 Jun 2013 18:57:42 GMT", <http://archive.is/20130618185742/http://www.cs.odu.edu/~mln/>; rel="first memento"; datetime="Tue, 18 Jun 2013 18:57:42 GMT", <http://archive.is/20130621194047/http://www.cs.odu.edu/~mln/>; rel="last memento"; datetime="Fri, 21 Jun 2013 19:40:47 GMT"
Memento-Datetime: Fri, 21 Jun 2013 19:40:47 GMT
Content-Type: text/html;charset=UTF-8
Expires: Tue, 09 Jul 2013 14:59:37 GMT
Date: Tue, 09 Jul 2013 14:59:27 GMT
Connection: keep-alive
Server: nginx/1.2.4

Also as mentioned above, Archive.is has been included in the Memento aggregators. Here's an aggregate TimeMap for Hany's home page, showing results from the Internet Archive and Archive.is:

% curl http://mementoproxy.lanl.gov/aggr/timemap/link/1/http://www.cs.odu.edu/~hany/

<http://www.cs.odu.edu/~hany/>;rel="original"
,<http://api.wayback.archive.org/web/20120222232858/http://www.cs.odu.edu/~hany/>;rel="memento first"; datetime="Wed, 22 Feb 2012 23:28:58 UTC"
,<http://api.wayback.archive.org/web/20120224022557/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Fri, 24 Feb 2012 02:25:57 UTC"
,<http://api.wayback.archive.org/web/20120325130430/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Sun, 25 Mar 2012 13:04:30 UTC"
,<http://api.wayback.archive.org/web/20120425103357/http://www.cs.odu.edu/%7Ehany/>;rel="memento"; datetime="Wed, 25 Apr 2012 10:33:57 UTC"
,<http://api.wayback.archive.org/web/20120425204316/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Wed, 25 Apr 2012 20:43:16 UTC"
,<http://api.wayback.archive.org/web/20120717212810/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Tue, 17 Jul 2012 21:28:10 UTC"
,<http://api.wayback.archive.org/web/20130117093627/http://www.cs.odu.edu/%7Ehany/>;rel="memento"; datetime="Thu, 17 Jan 2013 09:36:27 UTC"
,<http://api.wayback.archive.org/web/20130510145752/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Fri, 10 May 2013 14:57:52 UTC"
,<http://archive.is/20130624174400/http://www.cs.odu.edu/~hany/>;rel="memento"; datetime="Mon, 24 Jun 2013 17:44:00 UTC"
,<http://archive.is/20130626172346/http://www.cs.odu.edu/~hany/>;rel="memento last"; datetime="Wed, 26 Jun 2013 17:23:46 UTC"

One restriction Archive.is has is that Memento functionality is only available for top-level URIs and not embedded URIs. This means that although Archive.is has a memento for:

http://www.cs.odu.edu/~mln/images/mln-ad-100x130.jpg

stored at:

http://img.archive.is/DpiQe/7e9dcf3bab7c72c6516ef26d431a7b48d562599a.jpg

It does not store a mapping from the former to the latter, so these URIs will not produce the expected results:

http://archive.is/http://www.cs.odu.edu/~mln/images/mln-ad-100x130.jpg
http://archive.is/timegate/http://www.cs.odu.edu/~mln/images/mln-ad-100x130.jpg
http://archive.is/timemap/http://www.cs.odu.edu/~mln/images/mln-ad-100x130.jpg

This means that Archive.is can't be used to supplement top-level mementos from other archives that are missing embedded images, stylesheets, etc.

Regardless, there is a tremendous amount to like about how Archive.is, especially its ease of use. We also appreciate how quickly the staff there implemented Memento for their archive when asked. Use their bookmarklet, download MementoFox (or the Android & iOS client, or mcurl, or use the BL interface, or...), and you'll have easy access to over a dozen public web archives, now including Archive.is.

--Michael

Search This Blog

Web Science and Digital Libraries Research Group

2013-07-09: Archive.is Supports Memento

Comments

Post a Comment