Showing posts from August, 2019

2019-08-30: Where did the archive go? Part 1: Library and Archives Canada

Web archives are established with the objective of providing permanent access to archived web pages, or mementos. However, in our 14-month study of 16,627 mementos from 17 public web archives, we found that three web archives changed their base URLs and did not leave a machine readable method of locating their new URLs. We were able to manually discover the three new URLs for the archives. A fourth archive has partially ceased operations.

(1) Library and Archives Canada (
Around May 2018, mementos in this archive were moved to a new archive ( which has a different domain name. We noticed that 49 mementos (out of 351) can not be found in the new archive.

(2) The National Library of Ireland (NLI)
Around May 2018, the European Archive ( was shut down and the domain name was purchased by another entity. The National Library of Ireland (NLI) collection preserved by this archive was moved to another archive ( …

2019-08-24: Six WS-DL Classes Offered for Fall 2019

A record six WS-DL courses are offered for Fall 2019:
CS 418/518 Web Programming, Dr. Jian Wu, Tuesdays & Thursdays, 1:30-2:45pm
Topics: LAMP (Linux, Apache, MySQL, PHP), jQuery, git/GitHub  CS 431/531 Web Server Design, Sawood Alam, Thursdays, 4:20-7:00pm
Topics: HTTP, REST (Representational State Transfer), HATEOAS  CS 620 Introduction to Data Science, Dr. Sampath Jayarathna, Tuesdays, 4:20-7:00pm
Topics: NumPy, NoSQL, ML, RecommendersCS 625 Data Visualization, Dr. Michele C. Weigle, Tuesdays, 9:30am-12:15pm
Topics: Tableau, R, Vega-Lite, Data Types, Maps, Colors, TablesCS 891 Emerging Technologies, Dr. Justin F. Brunelle, Fridays, 3:00-5:30pm
Topics: Cloud, Edge, IoT, Blockchain, Security, AgileCS 795/895 Intelligent User Interfaces, Dr. Vikas Ashok, Thursdays, 9:30am-12:15pm
Topics: HCI, Assistive Technologies, Speech Interfaces, Wearable Computing, Crowd-Powered Interfaces I am on research leave for Fall 2019 and will not be teaching.

Dr. Brunelle's CS 891 is especially s…

2019-08-14: Building the Better Crowdsourced Study - Literature on Mechanical Turk

As part of "Social Cards Probably Provide For Better Understanding Of Web Archive Collections" (recently accepted for publication by CIKM2019), I had to learn how to conduct user studies. One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon's Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summarizes the lessons I have learned from other studies that have successfully employed MT. I have found parts of this information scattered throughout different bodies of knowledge, but not gathered in one place; thus, I hope it is a useful starting place for future researchers.

MT is by far the largest source of study participants, with over 100,000 available participants. MT is an automated system that facilitates the interaction of two actors: the requester and the worker. A worker signs up for an Amazon account and …

2019-08-03: Searching Web Archives for Unattributed Deleted Tweets From Politwoops

On May 11th 2019, Derek Willis, who works at Propublica and also maintains the Politwoops project, tweeted a list of deleted tweet ids found by Politwoops that could not be attributed to any Twitter handle being tracked by Politwoops. This was an opportunity for us to revisit our interest in using web archives to uncover the deleted tweets. Although we were unsuccessful in finding any of the deleted tweet ids in web archives provided by Politwoops, we are documenting our process for coming to this conclusion. Politwoops  Politwoops is a web service which tracks deleted tweets of elected public officials and candidates running for office in the USA and 55 other countries. The Politwoops USA is supported by Propublica. Creating Twitter handles list for the 116th Congress  In a previous post, we discussed the challenges involved in creating a data set of Twitter handles for the members of Congress and provided a data set of Twitter handles for the 116th Congress. A member of Congress can hav…