Web Science and Digital Libraries Research Group

Posts

Showing posts with the label content drift

2023-09-05: Paper Summary: "Gone, Gone, but Not Really, and Gone, But Not Forgotten: A Typology of Website Recoverability" (Reyes Ayala TempWeb '23)

By Tarannum Zaki - September 05, 2023

Brenda Reyes Ayala, " Gone, Gone, but Not Really, and Gone, But Not Forgotten: A Typology of Website Recoverability " , 13th Temporal Web Analytics Workshop ( TempWeb '23 ) in Companion Proceedings of the Web Conference 2023 (WWW '23) , Apr. 2023 (Texas, USA), pp. 1208-1213, doi: 10.1145/3543873.3587671 . We often come across web pages where we see ‘Error 404’, which means the server is unable to retrieve the requested page. Moreover, we also encounter web pages where the content significantly changes through time, moving away fro m the original referenced content. Such disappearance of web resources is a common phenomenon on the web. Web resources can disappear or change for a variety of reasons , such as server crashes, expired domains, hacking, creators abandoning websites and moving web resources to a different location. Disappearance of resources from the web is broadly termed as reference rot, which has two components - link rot and content drift . Link r...

2022-08-04: Web Archiving in Popular Media II: User Tasks of Journalists

By Lesley Frew - August 04, 2022

Figure 1: The two most common goals for journalists who use web archives as evidence in their articles is to view unavailable pages and to view page content change over time. Different groups of users collectively have different levels of understanding about web archives. In a previous post from 2016, Web Archiving in Popular Media , Scott Ainsworth demonstrated the emergence of web archives as evidence in journalism. The list of news articles that he presented as examples was a novel contribution at that time. Users with a strong mental model for the past web could benefit from advanced web archives features such as full-text search. What is the current mental model for web archives held by journalists, and how do web archives help journalists? Collecting Articles that Reference Web Archives In " Where'd it Go? " (2007), Teevan analyzed a set of web pages that were collected from a phrase search with the goal of understanding user behavior about re-finding [1]. By searc...

2017-01-15: Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data"

By Anonymous - January 15, 2017

Example: original URI vs. trusty URI Based on the paper: Kuhn, T. , Dumontier, M. : Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data . Proceedings of the European Semantic Web Conference (ESWC) pp. 395–410 (2014). A trusty URI is a URI that contains a cryptographic hash value of the content it identifies. The authors introduced this technique of using trusty URIs to make digital artifacts, specially those related to scholarly publications, immutable, verifiable, and permanent. With the assumption that a trusty URI, once created, is linked from other resources or stored by a third party, it becomes possible to detect if the content that the trusty URI identifies has been tampered with or manipulated on the way (e.g., trusty URIs to prevent man-in-the-middle attacks ). In addition, trusty URIs can verify the content even if it is no longer found at the original URI but still can be retrieved from other locations, such as Google's cache, ...