Posts

Showing posts from July, 2014

2014-07-25: Digital Preservation 2014 Trip Report

Image
Mat Kelly and Dr. Michael L. Nelson travel to Washington, DC and both report on their current research as well as be made aware of others' work in the field.                           
On July 22 and 23, 2014, Dr. Michael Nelson (@phonedude_mln) and I (@machawk1) attended Digital Preservation 2014 in Washington, DC. This was my fourth consecutive NDIIPP (@ndiipp) / NDSA (@ndsa2) meeting (see trip reports from Digital Preservation 2011, 2012, 2013). With the largest attendance yet (300+) and compressed into two days, the schedule was jam-packed with interesting talks. Per usual, videos for most of the presentations are included inline below.
Day One Micah Altman (@drmaltman) led the presentations with information about the NDSA and asked, regarding Amazon claiming reliability of 99.99999999999% for uptime, "What do the eleven nines mean?". "There are a number of risk that we know about [as archivists] that Amazon doesn't", he said, continuing, "No single…

2014-07-22: "Archive What I See Now" Project Funded by NEH Office of Digital Humanities

Image
We are grateful for the continued support of the National Endowment for the Humanities and their Office of Digital Humanities for our "Archive What I See Now" project.
In 2013, we received support for 1 year through a Digital Humanities Start-Up Grant.  This week, along with our collaborator Dr. Liza Potts from Michigan State, we were awarded a 3-year Digital Humanities Implementation Grant. We are excited to be one of the seven projects selected this year.

Our project goals are two-fold:
to enable users to generate files suitable for use by large-scale archives (i.e., WARC files) with tools as simple as the "bookmarking" or "save page as" approaches that they already knowto enable users to access the archived resources in their browser through one of the available add-ons or through a local version of the Wayback Machine (wayback). Our innovation is in allowing individuals to "archive what I see now". The user can create a standard web archive f…

2014-07-14: "Refresh" For Zombies, Time Jumps

Image
We've blogged before about "zombies", or archived pages that reach out to the live web for images, ads, movies, etc.  You can also describe it as the live web "leaking" into the archive, but we prefer the more colorful metaphor of a mixture of undead and living pages.  Most of the time Javascript is to blame (for example, see our TPDL 2013 paper "On the Change in Archivability of Websites Over Time"), but in this example the blame rests with the HTML <meta http-equiv="refresh" content="..."> tag, whose behavior in the archives I discovered quite by accident.

First, the meta refresh tag is a nasty bit of business that allows HTML to specify the HTTP headers you should have received.  This is occasionally useful (like loading a file from local disk), but more often that not seems to create situations in which the HTML and the HTTP disagree about header values, leading to surprisingly complicated things like MIME type sniffing

2014-07-14: The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScript

Image
One very large part of digital preservation is the act of crawling and saving pages on the live Web into a format for future generations to view. To accomplish this, web archivists use various crawlers, tools, and bits of software, often built to purpose. Because of these tools' ad hoc functionality, users expect them to function much better than a general purpose tool.As anyone that has looked up a complex web page in The Archive can tell you, the more complex the page, the less likely that all resources will be captured to replay the page. Even when these pages are preserved, the replay experience is frequently inconsistent from the page on the live web.We have started building a preliminary corpus of tests to evaluate a handful of tools and web sites that were created specifically to save web pages from being lost in time.In homage to the web browser evaluation websites by the Web Standards Project, we have created The Archival Acid Test as a first step in ensuring that these t…

2014-07-10: Federal Cloud Computing Summit

Image
As mention in my previous post, I attended the Federal Cloud Computing Summit on July 8th and 9th at the Ronald Reagan Building in Washington, D.C. I helped the host organization, the Advanced Technology And Research Center (ATARC) organize and run the MITRE-ATARC Collaboration Sessions that kick off the event on July 8th. The summit is designed to allow Government representatives to meeting and collaborate with industry, academic, and other Government cloud computing practitioners on the current challenges in cloud computing.

A FedRAMP primer was held at 10:00 AM on July 8th in a Government-only session. At its conclusion, we began the MITRE-ATARC Collaboration Sessions that focused on Cloud Computing in Austere Environments, Cloud Computing for the Mobile Worker, Security as a Service, and the Impact of Cloud Computing on the Enterprise. Because participants are protected by Chathan House Rule, I cannot elaborate on the Government representation or discussions in the collaboration …

2014-07-08: Presenting WS-DL Research to PES University Undergrads

Image
On July 7th and 8th, 2014, Hany SalahEldeen and I (Mat Kelly) were given the opportunity to present our PhD research to visiting undergraduate seniors from a leading university in Bangalore, India (PES University). About thirty students were in attendance at each session and indicated their interest in the topics through a large quantity of relevant questions.
Dr. Weigle (@weiglemc) Prior to ODU CS students' presentations, Dr. Michele C. Weigle (@weiglemc) gave the students an overview presentation of some of WS-DL's research topics with her presentation Bits of Research.In her presentation she covered both our lab's foundational work, recent work, some outstanding research questions, as well as some potential projects to entice interested students to work with our research group.Bits of Research from Michele Weigle
Mat (@machawk1), your author Between Hany and me, I (Mat Kelly) presented a fairly high level yet technical overview titled Browser-Based Digital Preservation, whi…