Posts

Showing posts from May, 2013

2013-05-30: World Wide Web Conference WWW2013 in Rio de Janeiro, Brazil, Trip Report

Image
After a long overnight flight, I landed in the sunny and beautiful Rio de Janeiro. A couple of months earlier, my paper entitled: “Carbon Dating the Web: Estimating the Age of Web Resources” was accepted at the third annual Temporal Web Analytics Workshop TempWeb03 which is associated with the 22nd World Wide Web conference WWW2013. My colleague Ahmed Al Sum’s paper got accepted as well entitled: “Archival HTTP Redirection Retrieval Policies”. Ahmed wrote a beautiful detailed post about the workshop which I encourage everyone to read.
I arrived on Monday the 13th morning at 6 AM and immediately took a taxi to the Windsor Barra hotel where the conference is held and where I will be residing for the next 5 days. My colleague Ahmed arrived a day earlier so he was bragging that he got the chance to relax and see the sunset on the beautiful beach. After a quick shower I went downstairs to the registration area to receive my ID tag and the conference kit. Everything went completely smooth an…

2013-05-29 mcurl - Command Line Memento Client

The Memento protocol works in two directions: Server implementation: the server complies with Memento protocol, so it can read the "Accept-Datetime" header, do the content-negotiation in datetime dimension, and return the memento near the requested datetime to the user. Successful examples include: Internet Archive Wayback Machine, British Library Wayback Machine, and DBpedia.Client implementation: the user needs a tool to sets the requested URI with the preferred  datetime in the past. Current tools include: FireFox add-ons MementoFox, British LibraryMemento Service, and Memento Browser for Android and iPhone. Today, we are pleased to announce mcurl, a command-line memento client. mcurl is a wrapper for the unix curl command that is capable of doing content negotiation in the datetime dimension with Memento TimeGates. mcurl supports all curl parameters in addition to the new parameters that are Memento related.

Users may use the curl command to do content-negotiation in the…

2013-05-25: Game Walkthroughs As A Metaphor for Web Preservation

Image
Do you remember playing the Atari 400/800 game "Star Raiders"?  Probably not, but for me it pretty much defined my existence in middle school: the obvious Star Wars inspiration, the stereo sound, the (for the time) complex game play, the 3D(-ish) first-person orientation -- this was all ground-breaking stuff for 1979.  It, along with games like "Eastern Front (1941)", inspired me at a young age to become a video game developer; an inspiration which did not survive my undergraduate graphics course

I could encourage you to (re)experience the game by pointing you to the ROM image for the game, as well an appropriate emulator (I used "Atari800MacX"), but without the venerable Atari joystick (the same one used in the more famous 2600 system), it just doesn't feel the same to me.  And although the original instructions have been scanned, the game play is complex enough that unlike most games of the era, you can't immediately understand what to do.  …

2013-05-21: An Update About Archiving Tweets

Image
Today I encountered this article about a UK driver bragging on Twitter about hitting a cyclist.  Rather than extend an already lengthy post about archiving tweets from two weeks ago, this example will be its own post. 

Summary: a woman hit a bicyclist participating in a race (the cyclist apparently was not seriously injured) and then bragged about it on Twitter.  The cyclist was apparently not going to report the event, but her bragging changed his mind and he contacted the police:

@emmaway20 we have had tweets ref an RTC with a bike. We suggest you report it at a police station ASAP if not done already & then dm us
— Norwich Police (@NorwichPoliceUK) May 19, 2013


The driver deleted her Twitter account, but the offending evidence has already been archived -- not just by concerned citizens making copies (check the thread in the Tweet above), but Topsy also has archived the evidence as well.



Interestingly, unlike the Twitpic examples in the previous post, the Instagram images do no…

2013-05-13: Temporal Web Workshop 2013 Trip Report

Image
On May 13, Hany SalahEldeen and I attended the third Temporal Web Analytic Workshop, collocated with WWW 2013 in Rio De Janeiro, Brazil.


Marc Spaniol, from Max Planck Institute for Informatics, Germany, welcomed the audience in the opening note of the workshop. He emphasized on the target of the workshop to build a community of interest in the temporal web.

Omar Alonso, from Microsoft Silicon Valley, was the keynote speaker with presentation entitled: “Stuff happens continuously: exploring Web contents with temporal information”. Omar divided his presentation into three parts: Time in document collection, Social data, and Exploring the web using time.

In the Time in document collection, Omar gave an intro about the temporal dimension of the document. He defined the characteristics of the temporal by first defining “What is Time?”. The time may be used in normalized format or hierarchy format. The time has 4 types: times; duration; sets, which may explicit (i.e., May 2, 2012) or implic…

2013-05-09: HTTP Mailbox - Asynchronous RESTful Communication

Image
We often encounter web services that take a very long time to respond to our HTTP requests. In the case of an eventual network failure, we are forced to issue the same HTTP request again. We frequently consume web services that do not support REST. If they did, we could utilize the full range of HTTP methods while retaining the functionality of our application, even when the external API we utilize in our application changes. We sometime wish to set up a web service that takes job requests, processes long running job queues and notifies the clients individually or in groups. HTTP does not allow multicast or broadcast messaging. HTTP also requires the client to stay connected to the server while the request is being processed.

Introducing HTTP Mailbox - An Asynchronous RESTful HTTP Communication System. In a nutshell, HTTP Mailbox is a mailbox for HTTP messages. Using its RESTful API, anyone can send an HTTP message (request or response) to anyone else independent of the availability,…

2013-05-07: Who Is Archiving Your Tweets?

Image
Who is archiving your tweets?

You're probably thinking "the Library of Congress".  And you're right, since 2010 they have been (see the announcements from Twitter and LC).  But LC is currently providing access only to researchers, and the scale of the archive makes access challenging (see LC's January 2013 white paper that provides a status update on the project).

To say I think this joint project between LC and Twitter is exciting and important is an understatement; I could go on about the scholarly importance, the cultural and technological record, the phenomena of social media, etc.  So I was surprised (but in retrospect, should not have been) when almost immediately afterwards projects like noloc.org surfaced so you could opt out of the archiving of your public tweets.

However, while you might be able to prevent LC from archiving your tweets, companies like Topsy are archiving them, or at least some of them.  Tospy is one of my new, favorite sites in part be…