Posts

Showing posts from 2020

2020-04-01: SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

Image
My research focuses on summarizing existing web archive collections through social media storytelling. For this effort, we developed Raintale to tell the stories produced by a selection of mementos. Collections exist at various web archives, like Archive-It and the UK Web Archive. As shown by Klein et al., we can build collections of mementos by conducting focused crawling of web archives. Raintale works well for these cases involving existing mementos, but what if we want to make a story about live web resources, like current events from the news?


WS-DL members have addressed other parts of the problem. Alexander Nwala’s research has centered on finding seeds within search engine result pages (SERPs), social media stories, and news feeds. As part of his news research, Nwala developed StoryGraph, a tool that analyzes multiple news sources every hour and automatically determines the news story or stories that dominate the media landscape at that time. Mohamed Aturban developed Archive…

2020-03-26: Memento Compliance Audit of PyWB

Image
This document is an audit report of the latest development version of PyWB, a Web archive replay sytem, for its Memento (RFC 7089) compliance. As a growing number of public Web archives are moving towards deploying PyWB, it becomes critical to comply with standards to ensure that tools in the archiving ecosystem continue to function as expected. To audit the Memento compliance of PyWB I established the following setup: Captured example.com five times in separate WARC files with the gap of a few minutes each using warcioCreated various test instances of PyWB's develop branch, which is one commit ahead of the v-2.4.0-rc6-test version (commit hash: 92e459bda52a2b03f33a4b0b8094ed424248d2a5)Initialized a collection named example and loaded freshly captured warc files in it for replayPlaced multiple custom configuration files that are loaded by setting PYWB_CONFIG_FILE environment variable for each test instancePreserved the state of the relevant folder tree in pywbtest.tar.gz for replica…