Posts

2013-07-09: Archive.is Supports Memento

Image
(2014-04-16 edit: Two days ago, archive.is started 301 redirecting to archive.today .   Otherwise, all the existing links should look and function as they had been.) There's a lot to like about Archive.is , a recent entry in the page-at-a-time personal web archiving space: the simple search/upload interface, the bookmarklet for easily pushing pages into the archive while reading, the thumbnails (and full-sized images) of captured pages, how it handles Javascript, etc.  But now there is an additional reason: Archive.is natively supports Memento and is now included in the Memento aggregators at LANL and ODU. Archive.is is similar to WebCite in that it archives a single page when a user requests that it be archived.  This is different from crawlers at, for example, the Internet Archive and Archive-It , which crawl the web all the time, archiving pages as they go along.  These archives represent different, complementary strategies for crawling the web: Archive.is, WebCite: s

2013-06-18: NTRS, Memento, and Handles

Image
In a previous post I covered the shut down of the NASA Technical Report Server , which has since come back online in a reduced capacity .  In this post we examine some of the peculiarities of the current state of NTRS, particularly with respect to Handles and Memento.  Earlier this week I needed to access an old NASA report of mine, ironically enough about NTRS, from 1996: Richard C. Tuey, Mary Collins, Pamela Caswell, Bob Haynes, Michael L. Nelson, Jeanne Holm, Lynn Buquo, Annette Tingle, Bill Cooper and Roy Stiltner, NASAwide Electronic Publishing System-Prototype STI Electronic Document Distribution: Stage-4 Evaluation Report, NASA TM-104630 (parts 1 and 2), May 1996. It is not a particularly enjoyable report; it is the kind of lengthy, multi-authored, sanitized, bureaucratic-engineering report that people write but don't read (a "better" summary can be found in AIAA-95-0964 ).  I probably have a pdf of the report somewhere in my files, but instead I pulle

2013-05-30: World Wide Web Conference WWW2013 in Rio de Janeiro, Brazil, Trip Report

Image
After a long overnight flight, I landed in the sunny and beautiful Rio de Janeiro. A couple of months earlier, my paper entitled: “ Carbon Dating the Web: Estimating the Age of Web Resources ” was accepted at the third annual Temporal Web Analytics Workshop TempWeb03 which is associated with the 22nd World Wide Web conference WWW2013 . My colleague Ahmed Al Sum ’s paper got accepted as well entitled: “ Archival HTTP Redirection Retrieval Policies ”. Ahmed wrote a beautiful detailed post about the workshop which I encourage everyone to read. I arrived on Monday the 13th morning at 6 AM and immediately took a taxi to the Windsor Barra hotel where the conference is held and where I will be residing for the next 5 days. My colleague Ahmed arrived a day earlier so he was bragging that he got the chance to relax and see the sunset on the beautiful beach. After a quick shower I went downstairs to the registration area to receive my ID tag and the conference kit. Everything went c

2013-05-29 mcurl - Command Line Memento Client

The Memento protocol works in two directions: Server implementation: the server complies with Memento protocol, so it can read the "Accept-Datetime" header, do the content-negotiation in datetime dimension, and return the memento near the requested datetime to the user. Successful examples include: Internet Archive Wayback Machine, British Library Wayback Machine, and DBpedia. Client implementation: the user needs a tool to sets the requested URI with the preferred  datetime in the past. Current tools include: FireFox add-ons  MementoFox , British Library Memento Service , and Memento Browser  for Android and iPhone. Today, we are pleased to announce mcurl , a command-line memento client. mcurl is a wrapper for the unix curl command  that is capable of doing content negotiation in the datetime dimension with Memento TimeGates. mcurl supports all curl parameters in addition to the new parameters that are Memento related. Users may use the curl command to do content-n

2013-05-25: Game Walkthroughs As A Metaphor for Web Preservation

Image
Do you remember playing the Atari 400/800 game " Star Raiders "?  Probably not, but for me it pretty much defined my existence in middle school: the obvious Star Wars inspiration, the stereo sound, the (for the time) complex game play , the 3D(-ish) first-person orientation -- this was all ground-breaking stuff for 1979.  It, along with games like " Eastern Front (1941) ", inspired me at a young age to become a video game developer; an inspiration which did not survive my undergraduate graphics course .  I could encourage you to (re)experience the game by pointing you to the ROM image for the game, as well an appropriate emulator (I used " Atari800MacX "), but without the venerable Atari joystick (the same one used in the more famous 2600 system), it just doesn't feel the same to me.  And although the original instructions have been scanned, the game play is complex enough that unlike most games of the era, you can't immediately understa

2013-05-21: An Update About Archiving Tweets

Image
Today I encountered this article about a UK driver bragging on Twitter about hitting a cyclist .  Rather than extend an already lengthy post about archiving tweets from two weeks ago, this example will be its own post.  Summary: a woman hit a bicyclist participating in a race (the cyclist apparently was not seriously injured) and then bragged about it on Twitter.  The cyclist was apparently not going to report the event, but her bragging changed his mind and he contacted the police: @ emmaway20 we have had tweets ref an RTC with a bike. We suggest you report it at a police station ASAP if not done already & then dm us — Norwich Police (@NorwichPoliceUK) May 19, 2013 The driver deleted her Twitter account , but the offending evidence has already been archived -- not just by concerned citizens making copies (check the thread in the Tweet above), but Topsy also has archived the evidence as well. Interestingly, unlike the Twitpic examples in the previous post,

2013-05-13: Temporal Web Workshop 2013 Trip Report

Image
On May 13, Hany SalahEldeen and I attended the third  Temporal Web Analytic Workshop , collocated with WWW 2013 in Rio De Janeiro, Brazil. Marc Spaniol , from Max Planck Institute for Informatics , Germany, welcomed the audience in the opening note of the workshop. He emphasized on the target of the workshop to build a community of interest in the temporal web. Omar Alonso , from Microsoft Silicon Valley , was the keynote speaker with presentation entitled: “Stuff happens continuously: exploring Web contents with temporal information”. Omar divided his presentation into three parts: Time in document collection, Social data, and Exploring the web using time. In the Time in document collection, Omar gave an intro about the temporal dimension of the document. He defined the characteristics of the temporal by first defining “What is Time?”. The time may be used in normalized format or hierarchy format. The time has 4 types: times; duration; sets, which may explicit (i.e., May 2,