Posts

2017-12-31: ACM Workshop on Reproducibility in Publication

Image
On December 7 and 8 I attend the ACM Workshop on Reproducibility in Publication in NYC as part of my role as a member of the ACM Publications Board and co-chair (with Alex Wade ) of the Digital Library Committee.  The purpose of this workshop was to gather input from the various ACM SIGs about the approach to reproducibility and "artifacts", objects supplementary to the conventional publication process.  The workshop was attended by 50+ people, mostly from the ACM SIGs but also included representatives from other professional societies and repositories and hosting services.  A collection of the slides presented at the workshop and a summary report are being worked on now, and as such this trip report is mostly my personal perspectives on the workshop; I'll update with slides, summary, and other materials as they become available. This was the third such workshop that had been held, but it was the first for me since I joined the Publications Board in September of 201

2017-12-19: CNI Fall 2017 Trip Report

Image
The Coalition for Networked Information (CNI) Fall 2017 Membership Meeting was held in Washington, DC on December 11-12, 2017. University Librarian George Fowler and I represented ODU , which was recognized as a new member this year. CNI runs several parallel sessions of project briefings, so I will focus on those sessions that I was able to attend. The attendees were active on Twitter, using the hashtag #cni17f , and I'll embed some of the tweets below.  CNI has the full schedule (pdf) available and will have some of the talks on the CNI YouTube channel . (I'll note if any sessions I attended were scheduled to be recorded and add the link when published.) The project briefings page has additional information on each briefing and links to presentations that have been submitted. Dale Askey ( McMaster University ) has published his CNI Fall 2017 Membership Meeting notes , which covers several of the sessions that I was unable to attend. DAY 1 - December 11 Plenary

2017-12-14: Storify Will Be Gone Soon, So How Do We Preserve The Stories?

Image
Popular Storytelling service, Storify , will be shut down on May 16, 2018 . Storify has been used by journalists and researchers to create stories about events and topics of interest. It has a wonderful interface, shown below, that allows one to insert text, but also add social cards and other content from a variety of services, including Twitter, Instagram, Facebook, YouTube, Getty Images, and of course regular HTTP URIs. This screenshot displays the Storify editing Interface. As shown below, Storify is used by news sources to build and publish stories about unfolding events, as seen below for the Boston NPR Station WBUR . Storify is used by WBUR in Boston to convey news stories. It is also the visualization platform used for summarizing Archive-It collections in the Dark and Stormy Archives (DSA) Framework , developed by WS-DL members Yasmin AlNoamany, Michele Weigle, and Michael Nelson. In a previous blog post , I covered why this visualization technique works and why m

2017-12-11: Difficulties in timestamping archived web pages

Image
Figure 1: A web page from nasa.gov is archived  by Michael's Evil Wayback in July 2017. Figure 2: When visiting the same archived page in October 2017, we found that the content of the page has been tampered with.  The 2016 Survey of Web Archiving in the United States shows an increasing trend of using public and private web archives in addition to the Internet Archive (IA). Because of this tendency we should consider the question of validity of archived web pages deleivered by these archives.  Let us look at an example where the important web page https://climate.nasa.gov/vital-signs/carbon-dioxide/ , that keeps a record of the carbon dioxide (CO2) level in the Earth’s atmosphere, is captured by a private web archive “Michael’s Evil Wayback” on July 17, 2017 at 18:51 GMT. At this time, as Figure 1 shows, the CO2 was 406.31 ppm. When revisiting the same archived page in October 2017, we should be presented with the same content. Surprisingly, CO2 changed and bec

2017-12-03: Introducing Docker - Application Containerization & Service Orchestration

Image
For the last few years, Docker , the application containerization technology, has been gaining a lot of attraction from the DevOps community and lately it has made its way to the academia and research community as well. I have been following it since its inception in 2013. For the last couple years, it has become a daily driver for me. At the same time, I have been encouraging my colleagues to use Docker in their research projects. As a result, we are gradually moving away from one virtual machine (VM) per project to a swarm of nodes running containers of various projects and services. If you have accessed MemGator , CarbonDate , Memento Damage , Story Graph or some other WS-DL services lately, you have been served from our Docker deployment. We even have an on-demand PHP/MySQL application deployment system using Docker for the CS418 - Web Programming course . I ( @ibnesayeed ) have been selected as the @Docker Campus Ambassador for Old Dominion University! /cc @ODU @oducs

2017-11-22: Deploying the Memento-Damage Service

Image
Many web services such as  archive.is ,  Archive-It ,  Internet Archive , and  UK Web Archive  have provided archived web pages or mementos  for us to use. Nowadays, the web archivists have shifted their focus from how to make a good archive to measuring how well the archive preserved the page. It raises a question about how to objectively measure the damage of a memento that can correctly emulate user (human) perception. Related to this,  Justin Brunelle  devised a prototype for measuring the impact of missing embedded resources (the damage) on a web page. Brunelle, in his IJDL paper (and the earlier JCDL version), describes that the quality of a memento depends on the availability of its resources. The straight percentage of missing resources in a memento is not always a good indicator of how "damaged" it is. For example, one page could be missing several small icons whose absence users never even notice, and a second page could be missing a single embedd