2013-05-29 mcurl - Command Line Memento Client
The Memento protocol works in two directions:
- Server implementation: the server complies with Memento protocol, so it can read the "Accept-Datetime" header, do the content-negotiation in datetime dimension, and return the memento near the requested datetime to the user. Successful examples include: Internet Archive Wayback Machine, British Library Wayback Machine, and DBpedia.
- Client implementation: the user needs a tool to sets the requested URI with the preferred datetime in the past. Current tools include: FireFox add-ons MementoFox, British Library Memento Service, and Memento Browser for Android and iPhone.
Users may use the curl command to do content-negotiation in the datetime dimension by passing the "Accept-Datetime" header with -H argument and connect directly to the TimeGate, however mcurl has more features than that.
- TimeGate identification: using mcurl, the user needs to specify the datetime and the uri only. mcurl has its own default TimeGate, it could be overwritten by user. Also, mcurl can read the TimeGate from the link response header returned from the URI.
- Handling redirection: mcurl implemented the HTTP redirection retrieval policy as appeares in section 4 of the Memento Internet Draft v 7.
- Embedded resources rewriting: mcurl provides two modes for the embedded resources. Strict mode, where mcurl will accept the embedded resources URI from the web archive, and Thorough mode, where mcurl will repeat the content-negotiation for each embedded resource URI to get the best/nearest resource. Thus, using the Memento Aggregator mcurl can construct the page from multiple archives.
curl -H "Accept-Datetime: Fri, 05 Feb 2010 14:28:00 GMT"
http://mementoproxy.cs.odu.edu/aggr/timegate/http://ipl.org
If you look deeply in the returned page, you will find the embedded resources came from the live web instead of the web archive. It happened because the current Wayback Machine's Memento implementation doesn't provide rewriting for the embedded resources. This problem is easily solved by mcurl.
perl ./mcurl.pl -L --mode thorough
--datetime 'Fri, 05 Feb 2010 14:28:00 GMT'
--replacedump dump.txt http://ipl.org
Environment setup
mcurl is written in Perl, version 5 or later is required. Also, curl verion 7.15.5 and HTML::Parser package are required.
Memento related Parameters
mcurl supports a wide range of Memento related identifiers that help the user to set his favorite datetime, timegate and embedded resources mode.
- -tm, --timemap <link|rdf>: To select the type of Timemap it may be link or html.
- -tg, --timegate <uri[,uri]>: To select the favorite Timegates.
- -dt, --datetime <date in rfc822 format>: To select the date in the past (For example, Thu, 31 May 2007 20:35:00 GMT).
- -mode <thorough|fast>: To specify mcurl embedded resource policy, default value is thorough.
- --debug: To enable the debug mode to display more results.
mcurl is available on GitHub repository. There are three files required: mcurl.pl, MementoThread.pm, and MementoParser.pm.
Usage Examples
In this section, we list some usage examples that explain the behavior of mcurl.
- Calling an original resource with the default timegate mcurl.pl -I -L --debug --datetime 'Sun, 23 July 2006 12:00:00 GMT' http://www.cnn.com
- Calling timemap in link format with the default timegate mcurl.pl -I -L --debug --timemap link http://www.cnn.com
- Calling an original resource with a specific timegate mcurl.pl -I -L --debug --timegate 'http://mementoproxy.lanl.gov/aggr/timegate/' http://www.cnn.com
- Calling an original resource with a specific timegate mcurl.pl -I -L --debug --datetime 'Sun, 23 July 2006 12:00:00 GMT' --timegate 'http://mementoproxy.lanl.gov/aggr/timegate/' http://www.cnn.com
- Calling timemap in link format with the specific timegate mcurl.pl -I -L --debug --timemap link --timegate 'http://mementoproxy.lanl.gov/aggr/timegate/' http://www.cnn.com
- Calling an original resource that will respond with timegate in response headers mcurl.pl -I -L --debug --datetime "Thu, 23 July 2009 12:00:00 GMT" http://lanlsource.lanl.gov/hello
- Calling an original resource (R1) that has a redirection (R2), (R1) has valid mementos mcurl.pl -I -L --debug --datetime "Thu, 23 July 2009 12:00:00 GMT" http://www.zeit.de/
- Calling an original resource (R1) that has a redirection (R2), (R1) does NOT have valid mementos mcurl.pl -I -L --debug --datetime "Thu, 23 July 2009 12:00:00 GMT" http://lanlsource.lanl.gov
- Calling an original resource that has a timegate redirection mcurl.pl -I -L --debug --datetime "Mon, 23 July 2007 12:00:00 GMT" http://lanlsource.lanl.gov/hello
- Calling an original resource that has a timegate redirection mcurl.pl -I -L --debug --datetime "Sat, 23 July 2011 12:00:00 GMT" http://lanlsource.lanl.gov/hello
- Calling an original resource with Acceptable time period mcurl.pl -I -L --debug --datetime Thu, 23 July 2009 12:00:00 GMT; -P5MT5H;+P5MT6H' http://www.cs.odu.edu
- Calling an original resource with NOT Acceptable time period mcurl.pl -I -L --debug --datetime 'Thu, 23 July 2009 12:00:00 GMT; -P5MT5H;+P5MT6H' http://www.cs.odu.edu
- Calling an original resource with invalid Accept-datetime header mcurl.pl -I --debug --datetime 'Sun, 23 July xxxxxxxxxxxxxxxx' http://www.cnn.com
- Override the discovered timegate with the specific one mcurl.pl -I -L --debug --datetime "Sat, 23 July 2011 12:00:00 GMT" --timegate 'http://mementoproxy.cs.odu.edu/aggr/timegate' --override http://lanlsource.lanl.gov/hello
- using the --replacedump switch to dump the replacement for the embedded resources to an external file for further analysis mcurl.pl -L --mode thorough --datetime "Sat, 03 Dec 2010 12:00:00 GMT" --replacedump cnnreplace.txt http://www.cnn.com
- accessing the dbpedia archive mcurl.pl -L --mode thorough --datetime "Sat, 03 Dec 2010 12:00:00 GMT" http://dbpedia.org/page/Brisbane
Expected results: it will do the content negotiation in the datetime dimension, it uses the default timegate when required
Expected results: it will download the timemap in application-link format, it uses the default timegate
Expected results: it will do the content negotiation in the datetime dimension and get the last memento, it uses the specified timegate when required
Expected results: it will do the content negotiation in the datetime dimension, it uses the specified timegate when required
Expected results: it will download the timemap in application-link format, it uses the specified timegate when required
Expected results: it will do the content negotiation in the datetime dimension, the site will provide a timegate which will override the default timegate
Expected results: it will do the content negotiation in the datetime dimension for R2.
Expected results: it will do the content negotiation in the datetime dimension using R2.
Expected results: it will do the content negotiation in the datetime dimension, the site will provide a timegate which will override the default timegate. The timegate /tg/ has a redirection to /ta/
Expected results: it will do the content negotiation in the datetime dimension, the site will provide a timegate which will override the default timegate. The timegate /tg/ has a redirection to /ts/
Expected results: it will do the content negotiation in the datetime dimension with specified time period which has valid mementos, it uses the default timegate when required
Expected results: it will do the content negotiation in the datetime dimension with specified time period which does not have any valid mementos, it uses the default timegate when required
Response code: 400
Ahmed AlSum
Comments
Post a Comment