2012-08-20: MS Thesis: An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication

I am pleased to report on the successful completion of my Master's Degree thesis entitled "An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication". The problem that I hoped to resolve with the study was one that plagues software like Archive Facebook, even to this day, in that when the hierarchy a social media website changes, tools created to preserve content on those sites tend to break. By conforming these tools to a specification that is setup to represent the hierarchy of the target social media websites, these tools become adaptive without the need of continuous maintenance on the part of the developer.

Also in the study was an exploration and enumeration of various aspects of personal web archiving that prevent the field from taking advantage of the tools, procedures and mediums that are widely used in conventional web archiving. In addition to simply identifying the problem, I also created a Google Chrome extension, WARCreate that allows any viewable webpage to be preserved by the user into the Web ARC (WARC) format.

As the Internet Archive's Heritrix Web Crawler outputs preserved webpages to this format and their replay system, Wayback Machine is setup to consume this format, allowing a user to preserve webpages to this format is a step at bridging the gap between conventional and personal web archiving.

WARCreate's functionality was first presented at JCDL 2012 and further demonstrated at Digital Preservation 2012. Also at Digital Preservation 2012, I received an Innovation Award as Future Steward by the National Digital Stewardship Alliance (NDSA) and was subsequently interviewed on the Library of Congress / NDSA blog The Signal.

After a lengthy review process, I defended my thesis on August 3, 2012 and submitted the finalized version of the document to the registrar soon after.

I am extremely grateful to my advisor, Dr. Michele C. Weigle for her patience in helping me to get my writing up to par, Dr. Michael L. Nelson for ensuring that my ideas were sound enough for public presentation and Dr. Yaohang Li for his ideas on how to make my thesis research here more theoretical in future work.

Starting in Fall 2012, I will continue my research at Old Dominion University as a PhD student.

— Mat Kelly

