Showing posts from May, 2014

2014-05-28: The road to the most precious three letters, PHD

On May 10th, 2014, the commencement with hundreds of students wearing their caps and gowns and ready for the moment of graduation can’t be forgotten. For me, it was the coronation for a long trip towards my Ph.D. degree in computer science. A few days before that, on May 3rd, 2014, I submitted my dissertation that was entitled “ Web Archive Services Framework For Tighter Integration Between The Past And Present Web ” to the ODU registrar's office as a declaration of the completion of the requirements for the degree. On Feb 26th, 2014, I defended my dissertation that was presented with these slides and is available for watching on video streaming.   In my research, I explored a proposed service framework that provided APIs for the web archive corpus to enable users and third party developers to access the web archive on four levels. The first level is the content level that gives access to the actual content of web archive corpuses with various filter.  The second

2014-05-25: IIPC GA 2014

I attended the International Internet Preservation Consortium (IIPC) General Assembly 2014 ( #iipcGA14 ) hosted by the Bibliothèque nationale de France (BnF) in Paris.  Although the GA ran the entire week (May 19 -- May 23), I was only able to attend May 20 & 21.  It looks like I missed some good material on the first day, including keynotes from Wendy Hall and Wolfgang Nejdl , and a presentation from Common Crawl .  Martin Klein also presented an overview of the Hiberlink project, as well as the " mset attribute " that we are working on with the people from Harvard .  I arrived after lunch on May 20, in time for a really strong session on "Harvesting and access: technical updates", featuring talks about Solr indexing ( Andy Jackson et al.) ( Andy's slides ), deduplicating content in WARCs ( Kristinn Sigurðsson ), Heritrix updates (Kris Carpenter), and Open Wayback ( Helen Hockx ).  Within WS-DL, we haven't really done much with Solr in our p

2014-05-08: Support for Various HTTP Methods on the Web

While clearly not all URIs will support all HTTP methods, we wanted to know what methods are widely supported, and how well is the support advertised in HTTP responses. Full range of HTTP method support is crucial for RESTful Web services. Please read our  previous blog post  for definitions and pointers about REST and HATEOAS. Earlier, we have done a brief analysis of HTTP method support in the HTTP Mailbox paper. We have extended the study to carry out deeper analysis of the same and look at various aspects of it. We initially sampled 100,000 URIs from the DMOZ and found that only 40,870 URIs were live. Our further analysis was based on the response code, "Allow" header, and "Server" header for OPTIONS request from those live URIs. We found that out of those 40,870 URIs: 55.31% do not advertise which methods they support 4.38% refuse the OPTIONS method, either with a 405 or 501 response code 15.33% support only HEAD, GET, and OPTIONS 38.53% support

2014-04-14: ECIR 2014 Trip report

From ECIR 2014 official flicker account Between Apr. 14 to Apr. 16, 2014, in the beautiful Amsterdam city in Netherlands, I attended the the 36th European Conference on Information Retrieval (ECIR 2014). The conference started with Workshops/Tutorials day on Apr 13, which I didn't attend. The first day was the workshops and tutorials day. ECIR 2014 had a wide range of workshops/tutorials that covered various aspects of IR such as: Text Quantification: A Tutorial , GamifIR' 14  workshop,  Context Aware Retrieval and Recommendation workshop ( CaRR 2014 ), Information Access in smart cities workshop ( i-ASC 2014 ), and Bibliometric-enhanced Information Retrieval workshop ( BIR 2014 ). The main conference started on April 14 with a welcome note from the conference chair Maarten de Rijke . After that,   Ayse Goker , from Robert Gordon University presented the winner of Karen Spärck Jones award and the keynote speaker Eugene Agichtein , a professor at Emory University . His