Posts

Showing posts from September, 2012

2012-09-29: Data Curation, Data Citation, ResourceSync

Image
During September 10-11, 2012 I attended the UNC/NSF Workshop Curating for Quality: Ensuring Data Quality to Enable New Science in Arlington.  The structure of the workshop was to invite about 20 researchers involved with all aspects of data curation and solicit position papers in one of four broad topics:
data quality criteria and contextshuman and institutional factorstools for effective and painless curationmetrics Although the majority of the discussion was about science data, my position paper was about the importance of archiving the web.  In short, treating the web as the corpus that should be retained for future research.  The pending workshop report will have a full list of participants and their papers, but in the meantime I've uploaded to arXiv my paper, "A Plan for Curating `Obsolete Data or Resources'", which is a summary version of the slides I presented at the Web Archiving Cooperative meeting this summer. 

To be included in the workshop report are the…

2012-09-27: NFL Referee Kerfuffle

Image
For the first three weeks of the 2012 NFL season, replacement officials have refereed the games due to an ongoing labor dispute between the referees and the NFL. Every fan of a team that has been on the losing side of a call has voiced their opinion on the abilities of the replacement referees. Even Jon Stewart had something to say about the labor dispute.

This past Monday night during the Seahawks - Packers game, a controversial call essentially determined the winner of the game. This call was the powder keg that blew open the dam of angry recriminations and complaints directed at the replacement referees and the NFL. This was somewhat amusing to me as the people complaining seem to forget about all of the mistakes the regular referees appeared to make in all of the previous years. In 2008 one of the best referees in the NFL, Ed Hochuli made a rather horrendous call. I have to give him respect for owning up to it and apologizing. NFL fans have always complained about the officiating,…

2012-08-31: Benchmarking LANL's SiteStory

Image
On August 17th, 2012, Los Alamos National Laboratory's Herbert Van de Sompel announced the release of the anticipated transactional web archiver called SiteStory.
Very excited to announce the release of our SiteStory transactional archive solution #mementomementoweb.github.com/SiteStory/
— Herbert (@hvdsomp) August 17, 2012

The ODU WS-DL research group (in conjunction with The MITRE Corporation) performed a series of studies to measure the effect of the SiteStory on web server performance. We found that SiteStory does not significantly affect content server performance when it is performing transactional archiving. Content server performance slows from 0.076 seconds to 0.086 seconds per Web page access when the content server is under load, and from 0.15 seconds to 0.21 seconds when the resource has many embedded and changing resources.
A sneak-peek at how SiteStory affects server performance is provided below. Please see the technical report for a full description of these resul…