Posts

Showing posts with the label Web Archiving

2013-04-22: IIPC GA 2013

Image
From April 22--26, Michael Nelson and I attended International Internet Preservation Consortium ( IIPC ) General Assembly 2013 that was hosted by the National and University Library of Slovenia in Ljbuljana, Slovenia. This year is the ten-year anniversary of the IIPC. GA this year has the theme of " What were the past challenges? and how can we plan the future of IIPC? ". Also, this year, Old Dominion University becomes an official member of the IIPC. The GA has been organized into five days. Day 1: Monday, April 22, 2013 IIPC General Assembly . Mateja Komel Snoj, the director of the National and University Library Slovenia , and Alenka Kavčič – Čolić, the Head of Library Research Center at National and University Library Slovenia opened the days welcomed the attendance and showed their pleasure for hosting IIPC GA in Slovenia. Mateja emphasized the importance of the digital preservation and the rule of National and University Library Slovenia in the preservation o

2013-03-22: NTRS, Web Archives, and Why We Should Build Collections

Image
At the ResourceSync meeting this week, Simeon Warner brought my attention to the fact that the NASA Technical Report Server (NTRS) digital library had gone offline on March 19.  Although I have not been involved with it since about 2004, I was the creator of NTRS and it was a central part of my early career .  If you click on http://ntrs.nasa.gov/ now, you can a message saying the service is down.  Technically, you get an "HTTP/1.1 503 Service Temporarily Unavailable" message: $ curl -I http://ntrs.nasa.gov/ HTTP/1.1 503 Service Temporarily Unavailable Date: Sat, 23 Mar 2013 04:00:14 GMT Server: Apache/2.2.3 (Red Hat) Last-Modified: Fri, 22 Mar 2013 12:50:02 GMT ETag: "720003-300-4d882e4c05280" Accept-Ranges: bytes Content-Length: 768 Connection: close Content-Type: text/html; charset=UTF-8  And the body of the page says: The NASA technical reports server will be unavailable for public access while the agency conducts a review of the site's conten

2012-10-10: Zombies in the Archives

Image
Image provided from  http://www.taxhelpattorney.com/ In our current research, the WS-DL group has observed leakage in archived sites. Leakage occurs when archived resources include current content. I enjoy referring to such occurrences as "zombie" resources (which is appropriate given the upcoming Halloween holiday). That is to say, these resources are expected to be archived ("dead") but still reach into the current Web. In the examples below, this reach into the live Web is caused by URIs contained in JavaScript not being rewritten to be relative to the Web archive; the page in the archive is not pulling from the past archived content but is "reaching out" (zombie-style) from the archive to the live Web.  We provide two examples with humorous juxtaposition of past and present content. Because of  JavaScript, rendering a page from the past will include advertisements from the present Web. 2008 memento of cnn.com f

2012-08-20: MS Thesis: An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication

Image
I am pleased to report on the successful completion of my Master's Degree thesis entitled "An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication". The problem that I hoped to resolve with the study was one that plagues software like Archive Facebook , even to this day, in that when the hierarchy a social media website changes, tools created to preserve content on those sites tend to break. By conforming these tools to a specification that is setup to represent the hierarchy of the target social media websites, these tools become adaptive without the need of continuous maintenance on the part of the developer. Also in the study was an exploration and enumeration of various aspects of personal web archiving that prevent the field from taking advantage of the tools, procedures and mediums that are widely used in conventional web archiving. In addition to simply identifying the problem, I also created a Google Chrome extension, W

2012-06-12: JCDL 2012 Doctoral Consortium

Image
The ODU WS-DL research group kicked off  JCDL 2012  at  The George Washington University  by presenting the first two  Doctoral Consortium  papers on June 10th, 2012. The Doctoral Consortium is a workshop for PhD students that are in the early stages of defining their research. It is a venue for presenting a potential path through the PhD, as well as a way to receive feedback from peers and other researchers. Past WS-DL students have benefited from the workshop, including Joan Smith , Frank McCown , Martin Klein , Chuck Cartledge , and Ahmed Alsum . Hany SalahEldeen and I ( Justin F. Brunelle ) were honored and excited to be the next class of WS-DL students to participate. The first session was the Data Preservation and Curation section, chaired by Maristella Agosti . I presented the first paper entitled " Filling in the Blanks: Capturing Dynamically Generated Content ". My work will study capturing, sharing, and archiving Web 2.0 resources that traditional crawlers canno

2012-04-30: IIPC 2012 GA, A Week with Archivists!

Image
The International Internet Preservation Consortium ( IIPC ) held its annual general assembly meeting for 2012 from Apr 30 to May 4, 2012, in the Library of Congress in Washington D.C. I concluded this report based on Tweets using #IIPC12 from the meeting and my personal notes. In this report, I tried to assess how much the tweets about an event could give you a complete view about it. More details about the new approach will be in the blog comment. The first day, April 30, 2012 was open to the public. It was entitled "The Broad Value of Web Archives: Demonstrated Use", @hhockx: : #iipc12 opened just now by Martha Anderson . Laura Campbell welcomes the participants. @netpreserve: IIPC starts with 11 members, now has 42. @gregorylisa: Apropos quote at #IIPC12 "The great use of life is to spend it for something that will outlast it." William James.   @netpreserve: Gildas Illien from BnF sets the stage for researcher use case panel. @cleymour: Gildas Il

2012-02-11: Losing My Revolution: A year after the Egyptian Revolution, 10% of the social media documentation is gone.

Image
The Egyptian revolution on the 25th of January 2011 was unlike any other revolution in history because of the role of social media . Several blogs, Storify entries, web pages, channels on YouTube where created to document the revolution . Several books were even published documenting the 18 days . All of these contributions were made by the public, not historians, utilizing the tools of web 2.0 . As a result of all these contributions we have an enormous digital content including thousands of posts, tweets, images, videos and sound files narrating and documenting the revolution. Unfortunately, at the first anniversary of this revolution over 10% of this digital content is already gone. Websites like Twitter , YouTube , Facebook , Storify , 1000Memories , Blogger and IAmJan25 have allowed the public to document the events of the revolution in real-time. Storify, for example, allows the user to create a timed organized collection of tweets, links, images, posts, map locations or vid