Thursday, July 23, 2015

2015-07-22: I Can Haz Memento

Inspired by the "#icanhazpdfmovement and built upon the Memento  service, I Can Haz Memento attempts to expand the awareness of Web Archiving through Twitter. Given a URL (for a page) in a tweet with the hash tag "#icanhazmemento," the I Can Haz Memento service replies the tweet with a link pointing to an archived version of the page closest to the time of the tweet. The consequence of this is: the archived version closest to the time of the tweet likely expresses the intent of the user at the time the link was shared.
Consider a scenario where Jane shares a link in a tweet to the front page of cnn about a story on healthcare. Given the fluid nature of the news cycle, at some point, the story about healthcare would be replaced by another fresh story; thus the link in Jane's tweet and its corresponding intent (healthcare story) become misrepresented by Jane's original link (for the new story). This is where I Can Haz Memento comes into the picture. If Jane included "#icanhazmemento" in her tweet, the service would have replied Jane's tweet with a link representing:
  • An archived version (closest to her tweet time) of the front page healthcare story on cnn, if the page had already been archived within a given temporal threshold (e.g 24 hours)Or
  • A newly archived version of the same page. In other words, the service does the archiving and returns the link to the newly archived page, if the page was not already archived.
How to use I Can Haz Memento
Method 1: In order to use the service, include the hashtag "#icanhazmemento" in the tweet with the link to the page you intend to archive or retrieve an archived version. For example, consider Shawn Jones' tweet below for http://www.cs.odu.edu:
Which prompted the following reply from the service:
Method 2: In Method 1, the hashtag "#icanhazmemento" and the URL,  http://www.cs.odu.edu, reside in the same tweet, but Method 2 does not impose this restriction. If someone (@anwala) tweeted a link (e.g arsenal.com), and you (@wsdlodu) wished the request be treated in the same manner as Method 1 (as though "#icanhazmemento" and  arsenal.com were in the same tweet), all that is required is a reply to the original tweet (without the "#icanhazmemento") with a tweet which includes "#icanhazmemento." Consider an example of Method 2 usage:
  1. @acnwala tweets arsenal.com without "#icanhazmemento"
  2. @wsdlodu replies the @acnwala's tweet with "#icanhazmemento"
  3. @icanhazmemento replies @wsdlodu with the archived versions of arsenal.com
The scenario (1, 2 and 3) is outlined by the following tweet threads:
 I Can Haz Memento - Implementation

I Can Haz Memento is implemented in Python and leverages the Twitter Tweepy API. The implementation is captured by the following subroutines:
  1. Retrieve links from tweets with "#icanhazmemento": This was achieved due to Tweepy's api.search API method. The sinceIDValue is used to keep track of already visited tweets. Also, the application sleeps in between each request in order to comply with Twitter's API rate limits, but not before retrieving the URLs from each tweet.
  2. After the URLs in 1. have been retrieved, the following subroutine
    • Makes an HTTP Request to the Timegate API in order to get the the Memento (instance of the resource) closest to the time of tweet (since the time of tweet is passed as a parameter for datetime content negotiation):
    • If the page is not found in any archive, it is pushed to archive.org and archive.is for archiving:
The source code for the application is available on Gitub. We acknowledge the effort of Mat Kelly who wrote the first draft of the application. And we hope you use #icanhazmemento.
--Nwala

No comments:

Post a Comment