Posts

Showing posts from August, 2015

2015-08-28 Original Header Replay Considered Coherent

Image
Introduction
As web archives have advanced over time, their ability to capture and playback web content has grown. The Memento Protocol, defined in RFC 7089, defines an HTTP protocol extension that bridges the present and past web by allowing time-based content negotiation. Now that Memento is operational at many web archives, analysis of archive content is simplified. Over the past several years, I have conducted analysis of web archive temporal coherence. Some of the results of this analysis will be published at Hypertext'15. This blog post discusses one implication of the research: the benefits achieved when web archives playback original headers.

Archive Headers and Original Headers
Consider the headers (Figure 1) returned for a logo from the ODUComputer Science Home Page as archived on Wed, 29 Apr 2015 15:15:23 GMT.

HTTP/1.1 200 OKContent-Type: image/gifLast-Modified: Wed, 29 Apr 2015 15:15:23 GMT
Try to answer the question "Was the representation provided by the web arc…

2015-08-20: ODU, L3S, Stanford, and Internet Archive Web Archiving Meeting

Image
Two weeks ago (on Aug 3, 2015), I was glad to be invited to visit Internet Archive in San Francisco in order to share our latest work with a set of the Web Archiving pioneers from around the world.

The attendees were Jefferson Bailey and Vinay Goel from IA, Nicholas Taylor and Ahmed AlSum from Stanford, and Wolfgang Nejdl, Ivana Marenzi and Helge Holzmann from L3S.

First, we took a quick introduction to each others mentioning the purpose and the nature of our work to IA.

Then, Nejdl introduced the Alexandria project, and demoed the ArchiveWeb project, which aims to develop tools and techniques to explore and analyze Web archives in a meaningful way. In the project, they develop tools that will allow users to visualize and collaboratively interact with Archive-it collections by adding new resources in the form of tags and comments. Furthermore, it contains a collaborative search and sharing platform.

I presented the off-topic detection work with a live demo for the tool, which can be …

2015-08-18: Three WS-DL Classes Offered for Fall 2015

Image
The Web Science and Digital Libraries Group is offering three classes this fall.  Unfortunately there are no undergraduate offerings this semester, but there are three graduate classes covering the full WS-DL spectrum:

CS 695 - NoSQL Databases (CRN 21159) will be offered by Dr. Cartledge.  While we've used NoSQL databases in a variety of classes in the past, this is the first time we've offered a class entirely on this topic.  This is a good complement to the CS 495/595 Big Data class he offered last spring.    CS 734/834 - Introduction to Information Retrieval (CRNs 19986 & 20004) will be offered by Dr. Nelson.  Although the number and name have slightly changed, this will be similar to previous offerings of this class (e.g., see CS 895 spring 2014).   This class will broadly cover the foundations of information retrieval.  CS 791/891 - Visualization Seminar (CRNs 12619 & 12620)will be taught by Dr. Weigle.  This P/F course will cover the fundamentals of how to apply …