Wednesday, December 18, 2013

2013-12-18: Avoiding Spoilers with the Memento Mediawiki Extension

From Modern Family to the Girl with the Dragon Tatoo, fans have created a flood of fan-based wikis based on their favorite television, book, and movie series. This dedication to fiction has allowed fans to settle disputes and encourage discussion using these resources.
These resources, coupled with the rise in experiencing fiction long after it is initially released, has given rise to another cultural phenomenon: spoilers. Using a fan-based resource is wonderful for those who are current with their reading/watching, but is fraught with disaster for those who want to experience the great reveals and have not caught up yet.
Memento can help here.
Above is a video showing how the Memento Chrome Extension from Los Alamos National Laboratory (LANL) can be used to avoid spoilers while browsing for information on Downtown Abbey. This wiki is of particular interest because the TV show is released in the United Kingdom long before it is released in other countries. The wiki has a nice sign warning all visitors about impending spoilers should they read the pages within, but the warning is redundant, seeing as fans who have not caught up will know that spoilers are implied.
A screenshot of the page with this notice is shown below.
We can use Memento to view pre-spoiler versions.
To avoid spoilers for Downtown Abbey Series 4, we choose a date prior to its release: August 30, 2012. Then we use LANL's Memento Chrome Extension to browse to that date. The HTTP conversation for this exchange is captured using Google Chrome's Live HTTP Headers Extension and detailed in the steps below.
1. The Chrome Memento Extension sends a HEAD request to the site using Memento's Accept-Datetime header*.
2. Because there are no Memento headers in the response, it connects to LANL's Memento aggregator using a GET request with the same Accept-Datetime header and gets back a 302 redirection response.
3. Then it follows the URI from the Location response header to a TimeGate specifically set up for Wikia, making another GET request using the Accept-Datetime request header on that URI. The TimeGate uses the date given by Accept-Datetime to determine which revision of a page to retrieve. The URI for this revision is sent back in the Location response header as part of the 302 redirection response.
4. From here it performs a final GET request on the URI specified in the Location response header, which is the revision of the article closest to the date requested. A screenshot of that revision is shown below, without the spoiler warning.
Even though this method works, it is not optimal.
The external Memento aggregator must know about the site and provide a site-specific TimeGate.  In this case, the aggregator is merely looking for the presence of "wikia.com" in the URI and redirecting to the appropriate TimeGate in step 3. Behind the scenes, the Mediawiki API is used to acquire the list of past revisions and the TimeGate selects the best one in step 4. This requires LANL, or another Memento participant like the UK National Archives, to provide a TimeGate for all possible Wiki sites on the Internet, which is not possible.
To see where this is relevant, let's look at the fan site A Wiki of Ice and Fire, detailing information on the series A Song of Ice and Fire (aka Game of Thrones). LANL has no Memento TimeGate specifically for this real fan wiki, unlike what we saw with the Downtown Abbey site.
Here's a screenshot of the starting page for this demonstration. Let's assume we want to avoid spoilers from the book A Dance With Dragons, which was released in July 2011, so we choose the date of June 30, 2011.
1. The Chrome Memento Extension connects with an Accept-Datetime request header, hoping for a response with Memento headers.
2. Because there were no Memento headers in the response, it turns to the Memento Aggregator at LANL, which serves as the TimeGate, using the datetime given by the Accept-Datetime request header to find the closest version of the page to the requested date. The TimeGate then provides a Location response header containing the archived version of the page at the Internet Archive.
3. Using the URI from that Location response header, the page is then retrieved directly from the Internet Archive.

But this page has a date of 27 Apr 2011, which is missing information we want, like who played this character in the TV series, which was added to the 7 June 2011 revision of the page. This is because the Internet Archive only contains two revisions around our requested datetime: 27 Apr 2011 and 1 Aug 2011.  Even though the fan wiki contains the 7 June 2011 revision, the Internet Archive does not.

Fortunately, there is the native Memento Mediawiki Extension, supported by the Andrew Mellon Foundation, which addresses these issues. It has been developed jointly by Old Dominion University and LANL. Mediawiki was chosen because it is the most widely used Wiki software, used in sites such as Wikipedia and Wikia.

This native extension allows direct access to all revisions of a given page, avoiding spoilers. It can also return the data directly, requiring no Memento aggregators or other additional external infrastructure.
We set up a demonstration wiki using data from the same Game of Thrones fan wiki above. The video above shows this extension in action. Because our demonstration wiki has the native extension installed, it allows for access to all revisions of each article.
We will try the same scenario using this Memento-enabled wiki.
Here is a screenshot of the starting page for this demonstration.
In this case, because the Memento Mediawiki Extension has full Memento support, the HTTP messages sent are different. We again use the date June 30, 2011 to show that we can acquire information about a given article without revealing any spoilers from the book A Dance With Dragons, which was released on July 2011.
1. The Memento Chrome Extension sends an Accept-Datetime request header, but this time Mediawiki itself is serving as the TimeGate, deciding on the page closest to, but not over, the date requested. Mediawiki then issues its own 302 redirection response.
2. That response gives a Location response header pointing to the correct revision of the page, which was published on June 7, 2011, prior to the release of A Dance With Dragons. From here the Memento Chrome Extension can issue a GET request on that URI to retrieve the correct representation of the page.
As this demonstrates, running the Memento Mediawiki Extension on a fan wiki will ensure that site visitors can not only browse the site spoiler free, but also will get the date closest, but not over, their requested date. This way they avoid spoilers and don't miss any information.

To recap, the native extension gives us the following benefits:
  1. The Memento Infrastructure cannot know about all possible wikis and provide TimeGates for each one, so the chances of a wiki having one are low.
  2. The Internet Archive does not have all revisions of each fan wiki page, meaning that visitors to a fan wiki may miss out on information.
  3. Visitors to the fan wiki site who are trying to avoid spoilers don't need to worry about any issues with the Memento wiki TimeGate infrastructure. Changes to a wiki's API can threaten the whole process, and APIs change frequently while Memento is established by a more stable RFC. 
If you are running a fan wiki and want to help your visitors avoid spoilers, the Memento Mediawiki Extension is what you need. Please contact us and we'll help you customize it to your needs, if necessary.

--Shawn

* = Memento for Chrome version 0.1.11 actually performs two HEAD requests on the resource, but this will be fixed in the next release.

No comments:

Post a Comment