Posts

Showing posts with the label off-topic

2018-07-02: The Off-Topic Memento Toolkit

Image
Inspired by AlNoamany's work from " Detecting off-topic pages within TimeMaps in Web archives " I am pleased to announce an alpha release of the Off-Topic Memento Toolkit (OTMT). The results of testing with this software will be presented at iPres 2018 and those results are now available as a preprint . Web archive collections are created with a specific purpose in mind. A curator will supply seeds for the collection and create multiple versions of these seeds in order to study the evolution of a web page over time. This is valuable for following the changes in an organization or the events in a news story. Unfortunately, depending on the curator's intent, sometimes these seeds go off-topic. Because web archive crawling software has no way to know that a page is off-topic, these mementos are added to the collection. Below I list a few examples of off-topic pages within Archive-It collections. This memento from the Human Rights collection at Archive-It create

2015-08-20: ODU, L3S, Stanford, and Internet Archive Web Archiving Meeting

Image
Two weeks ago (on Aug 3, 2015), I was glad to be invited to visit Internet Archive in San Francisco in order to share our latest work with a set of the Web Archiving pioneers from around the world. The attendees were Jefferson Bailey  and Vinay Goel  from IA, Nicholas Taylor  and Ahmed AlSum from Stanford, and Wolfgang Nejdl , Ivana Marenzi  and Helge Holzmann from L3S . First, we took a quick introduction to each others mentioning the purpose and the nature of our work to IA. Then, Nejdl introduced the Alexandria project , and demoed the ArchiveWeb project, which aims to develop tools and techniques to explore and analyze Web archives in a meaningful way. In the project, they develop tools that will allow users to visualize and collaboratively interact with Archive-it collections by adding new resources in the form of tags and comments. Furthermore, it contains a collaborative search and sharing platform. I presented the off-topic detection work with a live demo for the