2020-11-18: Creating Collection Growth Curves With Archives Unleashed Toolkit And Hypercane

Figure 1: Creating collection growth curves with a web page text derivative Recently, I have been learning about Archives Unleashed Toolkit (AUT) , Hypercane , and how these tools can be used together . AUT is one of the tools from the Archives Unleashed Project , which can be used to analyze web archive collections. When AUT is given WARC or ARC files for a web archive collection, it can create network derivatives and text derivatives . The network derivatives have nodes which are the domains in a collection and the links between the nodes occur when there is one or more webpages in one domain that contains a link to a webpage in the other domain. AUT can create text derivatives that include information about either the web pages, images, PDFs, or other documents that are included in the collection. Hypercane, a tool developed by WS-DL's Shawn Jones , can be used to create WARC files that are associated with a public Archive-It collection. The WARC files created by Hypercane can