Posts

Showing posts from August, 2019

2019-08-30: Where did the archive go? Part 1: Library and Archives Canada

Image
Web archives are established with the objective of providing permanent access to archived web pages, or mementos. However, in our 14-month study of 16,627 mementos from 17 public web archives, we found that three web archives changed their base URLs and did not leave a machine readable method of locating their new URLs.  We were able to manually discover the three new URLs for the archives. A fourth archive has partially ceased operations. (1) Library and Archives Canada ( collectionscanada.gc.ca ) Around  May 2018 , mementos in this archive were moved to a new archive ( webarchive.bac-lac.gc.ca ) which has a different domain name. We noticed that 49 mementos (out of 351) can not be found in the new archive. (2)  The  National Library of Ireland (NLI)   Around  May 2018,  the  European Archive ( europarchive.org )  was shut down and the domain name was purchased by another entity. The National Library of Ireland (NLI)  collection preserved by this archive was moved to anothe

2019-08-24: Six WS-DL Classes Offered for Fall 2019

Image
https://xkcd.com/2180/ A record six WS-DL courses are offered for Fall 2019: CS 418/518 Web Programming , Dr. Jian Wu , Tuesdays & Thursdays, 1:30-2:45pm Topics : LAMP (Linux, Apache, MySQL, PHP), jQuery, git/GitHub  CS 431/531 Web Server Design , Sawood Alam, Thursdays, 4:20-7:00pm Topics : HTTP, REST (Representational State Transfer), HATEOAS  CS 620 Introduction to Data Science , Dr. Sampath Jayarathna , Tuesdays, 4:20-7:00pm Topics : NumPy, NoSQL, ML, Recommenders CS 625 Data Visualization , Dr. Michele C. Weigle , Tuesdays, 9:30am-12:15pm Topics : Tableau, R, Vega-Lite, Data Types, Maps, Colors, Tables CS 891 Emerging Technologies , Dr. Justin F. Brunelle , Fridays, 3:00-5:30pm Topics : Cloud, Edge, IoT, Blockchain, Security, Agile CS 795/895 Intelligent User Interfaces , Dr. Vikas Ashok , Thursdays, 9:30am-12:15pm Topics : HCI, Assistive Technologies, Speech Interfaces, Wearable Computing, Crowd-Powered Interfaces I am on research leave for Fall 2019 and will n

2019-08-14: Building the Better Crowdsourced Study - Literature on Mechanical Turk

Image
The XKCD comic " Study " parodies  the challenges of recruiting study participants. As part of " Social Cards Probably Provide For Better Understanding Of Web Archive Collections " (recently accepted for publication by CIKM2019 ), I had to learn how to conduct user studies. One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon's Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summarizes the lessons I have learned from other studies that have successfully employed MT. I have found parts of this information scattered throughout different bodies of knowledge, but not gathered in one place; thus, I hope it is a useful starting place for future researchers. MT is by far the largest source of study participants, with over 100,000 available participants . MT is an automated system that facilitates

2019-08-03: Searching Web Archives for Unattributed Deleted Tweets From Politwoops

Image
Tweet URL:  https://twitter.com/derekwillis/status/1127234631865118731 On May 11th 2019, Derek Willis , who works at Propublica and also maintains the Politwoops project, tweeted a list of deleted tweet ids found by Politwoops that could not be attributed to any Twitter handle being tracked by Politwoops. This was an opportunity for us to revisit our interest in  using web archives to uncover the deleted tweets . Although we were unsuccessful in finding  any of the deleted tweet ids  in web archives  provided by Politwoops, we are documenting our process for coming to this conclusion. Politwoops    Politwoops is a web service which tracks deleted tweets of elected public officials and candidates running for office in the USA and 55 other countries . The Politwoops USA is supported by  Propublica . Creating Twitter handles list for the 116th Congress  In a  previous post , we discussed the challenges involved in creating a data set of Twitter handles for