Showing posts from November, 2020

2020-11-18: Creating Collection Growth Curves With Archives Unleashed Toolkit And Hypercane

Figure 1: Creating collection growth curves with a web page text derivative Recently, I have been learning about Archives Unleashed Toolkit (AUT) , Hypercane , and how these tools can be used together . AUT is one of the tools from the Archives Unleashed Project , which can be used to analyze web archive collections. When AUT is given WARC or ARC files for a web archive collection, it can create network derivatives and text derivatives . The network derivatives have nodes which are the domains in a collection and the links between the nodes occur when there is one or more webpages in one domain that contains a link to a webpage in the other domain. AUT can create text derivatives that include information about either the web pages, images, PDFs, or other documents that are included in the collection. Hypercane, a tool developed by WS-DL's Shawn Jones , can be used to create WARC files that are associated with a public Archive-It collection. The WARC files created by Hypercane can

2020-11-15: Sapien Labs Virtual Symposium on Mental Health Trip Report

The Sapien Labs Virtual Symposium on The Future of Mental Health: Measurement, Treatment and Therapies was held virtually via Adobe Connect on 2-3 November 2020.   The symposium consisted of 2 sessions on each day.  Each session would begin with multiple presentations.  The presenters in each session would also join in a moderated panel discussion in the second half of that session.  Founder and CEO of Sapien Labs , Dr. Tara Thiagarajan , hosted the virtual symposium, introduced many of the speakers, and led multiple panel discussions. Symposium Day 1 Session 1 : Dr. Eiko Fried  from Leiden University   started the first session's presentations with his talk on "Measure Matters: Challenges to Assessing Mental Health Problems Pose a Substantial Barrier to Clinical Progress."  Dr. Fried stated that while proper measurement is critical, it is difficult especially when attempting to measure individuals' internal states to include personality, cognition, and mental healt

2020-11-04: New Twitter UI: Replaying Archived Twitter Pages That Never Existed

  Figure 1: Multiple Temporal Violations in an archived page with the new Twitter interface.  When you visit web archives to go back in time and look at a web page, you naturally expect it to display the content exactly as it appeared on the live web at that particular datetime. That is, of course, with the assumption in mind that all of the resources on the page were captured at or near the time of the datetime displayed in the banner for the root HTML page. However, we noticed that it is not always the case and problems with archiving Twitter's new UI can result in replaying Twitter profile pages that never existed on the live web. In our previous blog post , we talked about how difficult it is to archive Twitter's new UI, and in this blog post, we uncover how the new Twitter UI mementos in the Internet Archive are vulnerable to temporal violations . On Aug 18, 2020, we stumbled upon a recently archived memento (Figure 1) of Donald Trump’s Twitter profile page in the Inter