Web Science and Digital Libraries Research Group

Posts

Showing posts with the label software

2025-02-11: Getting to the Source of the (Memento) Damage

By David Calano - February 11, 2025

I've previously written about the Memento Damage project, originally started by Dr. Justin Brunelle , a Web service designed to estimate the amount of damage to a web archive by assessing it's missing resources. Previously, I had been specializating some of the project while working on the Memento Tracer project, funded by the Alfred P. Sloan Foundation , to take special considerations regarding the damage weighting for Web hosted repository pages. I have been making further updates to the Memento Damage project over the course of this year that helps improve this analysis and damage estimation. The most prominent is the implementation of a secondary crawler component for analyzing an archived repository and its source tree. Web-hosted Git repositories are hosted on centralized Web platforms, the largest being GitHub along with other major platforms such as GitLab, Bitbucket, and Sourceforge. The source files for a Git project are hosted "behind the scen...

2018-01-08: Introducing Reconstructive - An Archival Replay ServiceWorker Module

By Sawood Alam - January 08, 2018

Web pages are generally composed of many resource such as images, style sheets, JavaScript, fonts, iframe widgets, and other embedded media. These embedded resources can be referenced in many ways (such as relative path, absolute path, or a full URL). When the same page is archived and replayed from a different domain under a different base path, these references may not resolve as intended, hence, may result in a damaged memento . For example, a memento (an archived copy) of the web page https://www.odu.edu/ can be seen at https://web.archive.org/web/20180107155037/https://www.odu.edu/ . Note that domain name has changed from www.odu.edu to web.archive.org and some extra path segments are added to it. In order for this page to render properly, various resource references in it are rewritten, for example, images/logo-university.png in a CSS file is replaced with /web/20171225230642im_/http://www.odu.edu/etc/designs/odu/images/logo-university.png . Traditiona...

2017-02-13: Electric WAILs and Ham

By Unknown - February 13, 2017

Mat Kelly recently posted Lipstick or Ham: Next Steps For WAIL in which he spoke about the past, present, and potential future for WAIL. Web Archiving Integration Layer (WAIL) is a tool that seeks to address the disparity between institutional and individual archiving tools by providing one-click configuration and utilization of both Heritrix and Wayback from a user's personal computer. I am here to speak on the realization of WAIL's future by introducing WAIL-Electron. WAIL-Electron WAIL has been completely revised from a Python application using modern Web technologies into an Electron application. Electron combines a Chromium (Chrome) browser with Node.js allowing for native desktop applications to be created using only HTML, CSS, and JavaScript. The move to Electron has brought with it many improvements most importantly, of which is the ability to update and package WAIL for the three major operating systems: Linux, MacOS, and Windows. Support for these...

2016-06-03: Lipstick or Ham: Next Steps for WAIL

By Mat Kelly - June 03, 2016

The development, state, and future of 🐳 Web Archiving Integration Layer. 💄∨🐷? Some time ago I created and deployed Web Archiving Integration Layer (frequently abbreviated as WAIL ), an application that provides users pre-configured local instances of Heritrix and OpenWayback. This tool was originally created for the Personal Digital Archiving 2013 conference and has gone through a metamorphosis. The original impetus for creating the application was that the browser-based WARCreate extension required some sort of server-like software to save files locally because of the limitations of the Google Chrome API and JavaScript at the time (2012). WARCreate would perform an HTTP POST to thi...

2015-09-08: Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

By Shawn M. Jones - September 08, 2015

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision . Collaborating with Wikipedia, Harihar Shankar , Herbert Van de Sompel , Michael Nelson , and I were able to create the py-mement-client Python library to suit the needs of pywikibot . Over the course of library development, Wikipedia suggested the use of two services, Travis CI and Pypi, that we had not used before. We were very pleased with the results of those services and learned quite a bit from the experience. We have been using GitHub for years, and also include it here as part of the development toolchain for this Python project. We present three online services that solved the following problems for our Python library: Where do we store source code and documentation for the long term? - GitHub How do we ensure the project is...

2014-10-03: Integrating the Live and Archived Web Viewing Experience with Mink

By Mat Kelly - October 03, 2014

UPDATE: Download the latest version of Mink here . The goal of the Memento project is to provide a tighter integration between the past and current web. There are a number of clients now that provide this functionality, but they remain silent about the archived page until the user remembers to invoke them (e.g., by right-clicking on a link). We have created another approach based on persistently reminding the user just how well archived (or not) are the pages they visit. The Chrome extension Mink (short for Minkowski Space ) queries all the public web archives (via the Memento aggregator) in the background and will display the number of mementos (that is, the number of captures of the web page) available at the bottom right of the page. Selecting the indicator allows quick access to the mementos through a dropdown. Once in the archives, returning to the live web is as simple as clicking the "Back to Live Web" button. For the case where there are...