Posts

Showing posts from September, 2017

2017-09-19: Carbon Dating the Web, version 4.0

Image
With this release of Carbon Date there are new features being introduced to track testing and force python standard formatting conventions. This version is dubbed Carbon Date v4.0.



We've also decided to switch from MementoProxy and take advantage of the Memgator Aggregator tool built by Sawood Alam.

Of course with new APIs come new bugs that need to be addressed, such as this exception handling issue. Fortunately, the new tools being integrated into the project will allow for our team to catch and address these issues quicker than before as explained below.

The previous version of this project, Carbon Date 3.0, added Pubdate extraction, Twitter searching, and Bing search. We found that Bing has changed its API to only allow 30 day trials for its API with 1000 requests per month unless someone wants to pay. We also discovered a few more use cases for the Pubdate extraction by applying Pubdate to the mementos retrieved from Memgator. By default, Memgator provides the Memento-Datetim…

2017-09-13: Pagination Considered Harmful to Archiving

Image
While gathering data for our work in measuring the correlation of university rankings by reputation and by Twitter followers (McCoy et al., 2017), we discovered that many of the web pages which comprised the complete ranking list for U.S. News in a given year were not available in the Internet Archive. In fact, 21 of 75 pages (or 28%)  had never been archived at all. "... what is part of and what is not part of an Internet resource remains an open question" according to research concerning Web archiving mechanisms conducted by Poursadar and Shipman (2017).  Over 2,000 participants in their study were presented with various types of web content (e.g., multi-page stories, reviews, single page writings) and surveyed regarding their expectation for later access to additional content that was linked from or appeared on the main page.  Specifically, they investigated (1) how relationships between page content affect expectations and (2) how perceptions of content value relate to …