2022-03-02: Web Archiving Speedruns


The Game Walkthroughs and Web Archiving project has started this year and is funded by the IIPC. This project will focus on the synergy between web archiving, gaming, and video game live streaming platforms. One of the goals for this project is to apply gaming concepts to web archiving. For this blog post, I will focus on the integration of web archiving and the gaming concept of speedruns.

For my initial example of an archiving speedrun, I gave Brozzler and Browsertrix Crawler a set of URLs to archive and I recorded the archiving session to see which crawler would finish the set first. The set consisted of 20 URLs that were selected from a dataset that I will use for future automated livestreams. I also created a livestream script that will run two crawlers beside each other while they are archiving a set of URLs. The first speedrun round is included below.



During this speedrun Browsertrix was able to reach the halfway point (10 URLs archived) faster than Brozzler. Brozzler was able to catch up during the 13th URL, because it was fast at archiving Epic Game Store’s webpage and Amazon’s PS5 webpage. Brozzler was able to pass Browsertrix during the 14th URL when it finished Amazon’s Xbox Series S webpage. After that Brozzler was able to maintain the lead and finish the set of 20 URLs before Browsertrix. 



Tables 1 and 2: Results from running 10 archiving speedrounds with Brozzler and Browsertrix on the same set of 20 URLs

Ten archiving speedruns have been run on the same set of 20 URLs and Brozzler won each round (Tables 1 and 2). The commands used for Brozzler and Browsertrix during these speedruns are listed below.



In the next blog post I plan on either showing another example of integrating a gaming concept with web archiving or I will show an example of integrating a video game with a web archiving livestream. If you know of any difficult to archive webpages or webpages that would be interesting to see during an automated livestream, then you can use this Google Doc to suggest webpages to add to the dataset. If the suggested webpage seem safe for streaming on Twitch, YouTube Gaming, and Facebook Gaming, then the webpage will be added to the dataset. Also, you can suggest for a webpage to be removed, if there are any issues with one of the webpages added to the dataset.


Thanks IIPC for funding the Game Walkthroughs and Web Archiving project.

-- Travis Reid (@TReid803)

Comments