Posts

Showing posts with the label dataset

2025-02-10: Creating a Dataset of Archived Web Ads

Image
  Figure 1: Themes view for the web page we created to display ads from our dataset One of the goals for the Saving Ads project was to create a dataset of advertisements from the live web. To construct our dataset, we randomly selected websites from SimilarWeb’s top websites worldwide (including all categories except “Adult”), rendered a web page from each website, and if the page loaded ads, archived it. We repeated this process until we had collected at least 250 ads. Ultimately, we selected 17 web pages to archive, resulting in the collection of 279 advertisements (Table 1). To archive these web ads we used four web archiving services and three browser-based tools: Web archiving services Internet Archive's Save Page Now Arquivo.pt archive.today Conifer Browser-based tools ArchiveWeb.page Browsertrix Crawler Brozzler The four web archiving services (Save Page Now, Arquivo.pt, archive.today, and Conifer) archived two web pages each, ArchiveWeb.page and Browsertrix Crawler archi...