Posts

Showing posts with the label WWW 2013

2017-09-19: Carbon Dating the Web, version 4.0

Image
With this release of Carbon Date there are new features being introduced to track testing and force python standard formatting conventions. This version is dubbed Carbon Date v4.0. We've also decided to switch from MementoProxy and take advantage of the  Memgator Aggregator tool built by Sawood Alam. Of course with new APIs come new bugs that need to be addressed, such as this exception handling issue . Fortunately, the new tools being integrated into the project will allow for our team to catch and address these issues quicker than before as explained below. The previous version of this project, Carbon Date 3.0 , added Pubdate  extraction, Twitter searching, and Bing  search. We found that Bing has changed its API to only allow 30 day trials for its API with 1000 requests per month unless someone wants to pay . We also discovered a few more use cases for the Pubdate extraction by applying Pubdate to the mementos retrieved from Memgator. By default, Memgator provides t

2016-09-20: Carbon Dating the Web, version 3.0

Image
Due to API changes, the old carbon date tool is out of date and some modules no longer work, such as topsy . I have taken up the responsibility of maintaining and extending  the service, beginning with the following now available in Carbon Date v3.0. Carbon date 3.0 What's new New services have been added, such as bing searching , twitter searching and pubdate parsing . The new software architecture enable us to load given scripts or disable given services during runtime. The server framework has been changed from CherryPy server to  tornado server which is still a python minimalist WSGI server, with better performance. How to use the Carbon Date service Through the website , http://carbondate.cs.odu.edu : Given that carbon dating is computationally intensive, the site can only hold 50 concurrent requests, and thus the web service should be used just for small tests as a courtesy to other users. If you have the need to Carbon Date a large number of URLs,

2014-11-14: Carbon Dating the Web, version 2.0

Image
For over 1 year, Hany SalahEldeen's Carbon Date service has been out of service mainly because of API changes in some of the underlying modules on which the service is built upon. Consequently, I have taken up the responsibility of maintaining the service, beginning with the following now available in Carbon Date v2.0. Carbon Date v2.0 The Carbon Date service currently makes requests to the different modules (Archives, backlinks, etc.), in a concurrent manner through threading. The server framework has been changed from bottle server to CherryPy server which is still a python minimalist WSGI server, but a more robust framework which features a threaded server. How to use the Carbon Date service There are three ways: Through the website, http://cd.cs.odu.edu/ : Given that carbon dating is highly computationally intensive, the site should be used just for small tests as a courtesy to other users. If you have the need to Carbon Date a large number of URLs, y

2013-05-30: World Wide Web Conference WWW2013 in Rio de Janeiro, Brazil, Trip Report

Image
After a long overnight flight, I landed in the sunny and beautiful Rio de Janeiro. A couple of months earlier, my paper entitled: “ Carbon Dating the Web: Estimating the Age of Web Resources ” was accepted at the third annual Temporal Web Analytics Workshop TempWeb03 which is associated with the 22nd World Wide Web conference WWW2013 . My colleague Ahmed Al Sum ’s paper got accepted as well entitled: “ Archival HTTP Redirection Retrieval Policies ”. Ahmed wrote a beautiful detailed post about the workshop which I encourage everyone to read. I arrived on Monday the 13th morning at 6 AM and immediately took a taxi to the Windsor Barra hotel where the conference is held and where I will be residing for the next 5 days. My colleague Ahmed arrived a day earlier so he was bragging that he got the chance to relax and see the sunset on the beautiful beach. After a quick shower I went downstairs to the registration area to receive my ID tag and the conference kit. Everything went c

2013-05-13: Temporal Web Workshop 2013 Trip Report

Image
On May 13, Hany SalahEldeen and I attended the third  Temporal Web Analytic Workshop , collocated with WWW 2013 in Rio De Janeiro, Brazil. Marc Spaniol , from Max Planck Institute for Informatics , Germany, welcomed the audience in the opening note of the workshop. He emphasized on the target of the workshop to build a community of interest in the temporal web. Omar Alonso , from Microsoft Silicon Valley , was the keynote speaker with presentation entitled: “Stuff happens continuously: exploring Web contents with temporal information”. Omar divided his presentation into three parts: Time in document collection, Social data, and Exploring the web using time. In the Time in document collection, Omar gave an intro about the temporal dimension of the document. He defined the characteristics of the temporal by first defining “What is Time?”. The time may be used in normalized format or hierarchy format. The time has 4 types: times; duration; sets, which may explicit (i.e., May 2,

2013-04-19: Carbon Dating the Web

Image
(note: Carbon Date 2.0 was released on 2014-11-14 ) In the course of our research we often needed to determine when a certain web resource was created. In numerous cases, this question is fairly straightforward to answer by examining the resource itself. Articles often have publishing datetime stamps, social media contributions have posting time, and others you can estimate the creation date from reading the resource itself. This process is simple upon manually examining the resource, but when the dataset of resources is large it is harder to automate. To solve this problem we conducted several experiments to determine when the resource was created automatically. When a resource is created it often gets indexed in the search engines, archived in the public archives, and shared in the social media thus leaving trails of existence. We trace those trails of existence and use the first appearance of the first trail as a close estimate of the creation date. The timeline below illustra