Posts

2013-10-14: Right-Click to the Past -- Memento for Chrome

Image
Last week LANL released Memento for Chrome , an extension that adds Memento capability for Chrome browsers.  It represents such a leap in capability and speed that the prior MementoFox (Memento for FireFox) add-on should be considered deprecated.  It's not just a FireFox vs. Chrome thing either; Memento for Chrome features a subtle change in how it interacts with the past and present.  MementoFox had a toggle switch for present vs. Time Travel mode that would trap and modify all outbound requests , from the current page and all subsequent pages until turned off, to go from the form of: http://example.com/index.html to: http://mementoproxy.lanl.gov/aggr/timegate/http://example.com/index.html This involved some complicated logic to determine when you were getting a memento (i.e., archived web entity) vs. something from the live web.  When you factored in native Memento archives vs. proxied Memento archives, things could get hairy (see the 2011 Code4Lib paper for a (dat

2013-10-11: Archive What I See Now

Earlier this year, we were awarded an NEH Digital Humanities Start-Up Grant for our project "Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher. We were invited to attend the NEH Office of Digital Humanities Project Directors' Meeting in early October, but due to the government shutdown, the meeting was cancelled.  Here I'll give the quick overview of the project that I'd planned for that meeting.  (Mat Kelly has already posted a nice description of the tools we've been developing, WARCreate and WAIL, at http://bit.ly/wc-wail .) The slides I'd prepared are below: "Archive What I See Now" - NEH ODH overview from Michele Weigle Our project is focused on helping people archive web pages. Since much of our cultural heritage is now published on the web, we want to make sure that important pages are archived for the future. Since 1996, the Internet Archive and other archiving services ha

2013-10-04: TPDL 2013 Trip Report

Image
I attended the 2013 Theory and Practice of Digital Libraries (TPDL) Conference on September 22-26 in Valletta, Malta .  Although I've had papers at several of the prior TPDL (known as ECDL prior to 2011) conferences , I think this is the first one I've personally attended since ECDL 2005 in Austria.  Normally I prefer to send students to present their papers, but this year we had five full papers accepted, so I could not afford to send all the students and I went in their stead.  An unfortunate side effect of having so many papers is that between preparation and my own presentations I was unable to see as much of the conference as I would have liked. The conference began with Herbert Van de Sompel and I giving a tutorial about ResourceSync .  Attendees registered for all tutorials and were free to attend whichever one they preferred.  We had as many as ten people in ours at one point, but more importantly we had some key people present who will be implementing Resource

2013-09-09: MS Thesis: HTTP Mailbox - Asynchronous RESTful Communication

Image
It is my pleasure to report the successful completion of my Master's degree thesis entitled "HTTP Mailbox - Asynchronous RESTful Communication". I have defended my thesis on July 11th and got my written thesis accepted on August 23rd 2013. In this blog post I will briefly describe the problem that the thesis is targeting at followed by proposed and implemented solution to the problem. I will walk through an example that will illustrate the usage of the HTTP Mailbox then I will provide various links and resources to further explore the HTTP Mailbox. Traditionally, general web services used only the GET and POST methods of HTTP while several other HTTP methods like PUT, PATCH, and DELETE were rarely utilized. Additionally, the Web was mainly navigated by humans using web browsers and clicking on hyperlinks or submitting HTML forms. Clicking on a link is always a GET request while HTML forms only allow GET and POST methods. Recently, several web frameworks/libraries hav

2013-09-06: Wolfram Data Summit 2013 Trip Report

Image
I was fortunate enough to be invited to present at the 2013 Wolfram Data Summit in Washington DC, September 5-6, 2013.  My talk was about the future of web archiving, but the focus of the data summit was " big data ".  As such, there was a variety of disciplines represented at the summit since the unifying factor was the scale of the data.  Logistics dictated that I missed several of the presentations, but many of the ones I did attend were very engaging.  The slides will be posted at the Wolfram site later, but I'll provide some short summaries below (2013-11-26 edit: the presentations are now available ). First was Greg Newby presenting about Project Gutenberg , the long-running collection of free ebooks.  His focus was on PG as a portable collection, which is subtly different from universal access from different interfaces (even if the interface is just Google).  The focus was more on PG as a collection to be explored and personalized services to be built-on.  Du

2013-08-24: Two WS-DL Classes Offered for Fall 2013

Image
Two WS-DL classes are offered for Fall 2013: CS 725/825 - Information Visualization , Dr. Weigle CS 495/595 - Introduction to Web Science , Dr. Nelson Information Visualization has been taught twice before, but with a 795/895 course number.  This semester will be the first time that Web Science has been taught at ODU, although the course is based on Dr. McCown 's Spring 2013 class at Harding University. --Michael

2013-08-23: Archive-It Supports Memento

Image
Earlier this week , Archive-It (the subscription-based collection development service from the Internet Archive) implemented Memento support for their collections, including the newly established "all" collection.  This is a follow-on from the recent Internet Archive upgrade of their Memento support in the Wayback Machine .  Prior to Archive-It's support of Memento, their collections were included in the Memento aggregator  by proxy .  While dozens of archives are included in the aggregator via proxies, native Memento support is faster and more functional. Here is an HTTP snippet using an archived PDF of a NASA report from an earlier post about NTRS . % curl -I -L http://wayback.archive-it.org/all/http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028190_1996060846.pdf HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Vary: accept-datetime Link: <http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028190_1996060846.pdf>; rel="origin

2013-07-26: Web Archiving and Digital Libraries workshop - WADL 2013 Trip Report

Image
On July 25th and 26th 2013, the WS-DL group attended the Web Archiving and Digital Libraries Workshop that was collocated with JCDL 2013 at Indianapolis, IN. Ed Fox , from Virginia Tech , opened the workshop by greeting the attendees. Then, Andreas Paepcke gave two presentations. The first presentation was entitled: "ArcSpread: Enabling Web Archive Analysis for non-CS experts". In this presentation, Andreas showed how to make the web archive useful outside the computer scientists. ArcSpread uses spreadsheet interface to help the user to gain information from the web archive. ArcSpread started with analysis activities such as filtering, aggregating, classifying, and manual coding. The output product is a spreadsheet that can answer some questions related to specific queries (e.g., Hurricane Katrina) such as: pages with words, images with the term, place/people name, and most frequent names. ArcSpread depends on sheet engine with Hadoop cluster of 60 nodes. The second pres

2013-07-26: Digital Preservation 2013 Trip Report

Image
The time of year has again arrived for conferences related to our research area of web sciences and digital libraries. While much our group will be representing the university at the Joint Conference on Digital Libraries (JCDL) conference in Indianapolis ( trip report ), I was given the opportunity to attend Digital Preservation 2013 in Alexandria, Virginia. Being much closer to home in Hampton Roads, this is the third year running that I have attended this conference ( 2012 Trip Report , 2011 Trip Report ), having presented digital preservation tools at each: Archive Facebook in 2011 and WARCreate in 2012. Following up from the recent public release of WARCreate (see the announcement ), I gave a presentation on another package I had created, Web Archiving Integration Layer (WAIL) , originally unveiled at Personal Digital Archiving 2013 in February ( Trip Report ), WARCreate, and how all of the pieces fit together titled: WARCreate and WAIL: WARC, Wayback and Heritrix Made E