Showing posts from October, 2019

2019-10-31: Continuing Education to Advance Web Archiving (CEDWARC)

Note: This blog post may be updated with additional links to slides and other resources as they become publicly available. On October 28, 2019, web archiving experts met with librarians and archivists at the George Washington University in Washington, DC. As part of the Continuing Education to Advance Web Archiving (CEDWARC) effort, we covered several different modules related to tools and technologies for web archives. The event consisted of morning overview presentations and afternoon lab portions. Here I will provide an overview of the topics we covered. Web Archiving Fundamentals Prior to attending the event Edward A. Fox , Martin Klein , Anna Perricci , and Zhiwu Xie created a brief tutorial covering the fundamentals of web archiving. This tutorial, shown below, was distributed as a video to attendees prior to the event so they could familiarize themselves with the concepts we would discuss at CEDWARC. Zhiwu Xie kicked off the event with a refresher o

2019-10-28: The interaction between search engine caches and web archives

News articles from Indian newspapers about a corruption case involving an Indian doctor. The left images show screenshots of the article from the print newspaper. The right images show URLs for the articles returning with 404 pages.   My brother, a lawyer in India, recently sent me two screenshots shown in Figures 1 and 2, of a news article about a corruption case involving a renowned doctor from India. In order to proceed with legal proceedings against the newspapers for publishing the article, my brother needed some evidence about the publication of the articles. Therefore he sought my help in finding the URLs of the articles shown in the screenshots. The news articles were published in an English language newspaper,  The Asian Age , and a Hindi language newspaper,  Punjab Kesari .  Figure 1: Screenshot of the news article from the English language newspaper,  The Asian Age  shared with me by my brother Figure 2: Screenshot of the news article from the Hindi languag

2019-10-25: Summary of "Proactive Identification of Exploits in the Wild Through Vulnerability Mentions Online"

Figure 1 Disclosed Vulnerabilities by Year (Source: CVE Details ) The number of software vulnerabilities discovered and disclosed to the public is steadily increasing every year.  As shown in Figure 1, in 2018 alone, more than 16,000 Common Vulnerabilities and Exposures ( CVE ) identifiers were assigned by various CVE Numbering Authorities (CNA) .  CNAs are organizations from around the world that are authorized to assign CVE IDs to vulnerabilities affecting products within their distinct, agreed-upon scope. In the presence of voluminous amounts of data and limited skilled cyber security resources , organizations are challenged to identify the vulnerabilities that pose the greatest risk to their technology resources. One of the key reasons the current approaches to cyber vulnerability remediation are ineffective is that organizations cannot effectively determine whether a given vulnerability poses a meaningful threat. In their paper,  " Proactive Identification of Exploi

2019-10-21: Where did the archive go? Part 4: WebCite

Image We previously described changes in the following web archives: In Where did the archive go? Part 1 , we provided some details about changes in the  archive Library and Archives Canada . After they upgraded their replay system, we were no longer able to find 49 out of 351 mementos (archived web pages). In Part 2 , we focused on the movement of the National Library of Ireland (NLI). Mementos from NLI collection were moved from the European Archive to the Internet Memory Foundation (IMF) archive. Then, they were moved to Archive-It . We found that 192 mementos, out of 979, cannot be found in Archive-It. In Part 3 , we described changes in the Public Record Office of Northern Ireland (PRONI) Web Archive . Mementos in the PRONI archive were moved to Archive-It ( ). We discovered that 114 mementos, out of 469, can no longer be found in Archive-It (i.e., missing mementos). In the last part of this four part series, we focus on changes in webcitation