Wednesday, July 10, 2013

2013-07-10: WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy

As the Web Science and Digital Libraries Research Group, we regularly interact with end users as well as developers that are interested in digital preservation. One of our goals is to assist in making web preservation accessible to regular users instead of just power users.  As computer scientists, this frequently means creating software. A few digital preservation software packages that were created by WS-DLers include:


Because shrimp, that's why.

  • Warrick - a utility for reconstructing (or recovering) a website using various archives and caches.
  • Synchronicity - a Firefox extension that supports the user in rediscovering missing web pages
  • mcurl - a command-line memento client
and two that are dear to my heart:
And other sea creatures

I had developed these two packages for JCDL2012 and PDA2013, respectively, with the former being given the Future Steward award from the National Digital Stewardship Alliance (NDSA) at Digital Preservation 2012.  While this was all swell for our group, one problem remained that was again surfaced at PDA2013 in February, where WAIL (which was a spin-off of the WARCreate server decoupling) was unveiled. At PDA, I made sure well before-hand that WAIL was available to the public in a double-click-and-go binary (pre-compiled executable, i.e., App) form. While we keep all of the software we develop free and open source, WARCreate remained experimental and thus never "released", per se, though anyone could download the source and try it if they we really eager.

Per above, we are technical and, as learned with WAIL, users are more willing to try your software when the barriers (e.g., compiling from source) are minimized. With WARCreate getting its first reference citation, it was time to formally release the tool, in binary form, for public consumption - ready or not.

WARCreate is now available for download in the Chrome Web Store.
To use it:
  1. Enable WARCreate in Chrome
  2. Navigate to a webpage
  3. Click the WARCreate logo on the right of the address bar
  4. Hit the "Generate WARC" button
Within seconds, a Web Archive (WARC) file will be created of the currently viewed webpage and saved to your downloads folder. Alternatively, WARCreate may crash or not behave 100 percent as expected, but I will gladly address bugs encountered by e-mail, through github issues or confront me personally at Digital Preservation 2013 on July 24, 2013 in Alexandria, Virginia where I will be doing a presentation on WAIL and WARCreate. There are sure to be bugs, however, pre-release software is better than no-release software.

— Mat (@machawk1)

No comments:

Post a Comment