2020-06-17: Hypercane Part 3: Building Your Own Algorithms
This image by NASA is licensed under NASA's Media Usage Guidelines In Part 1 , we introduced Hypercane , a tool for automatically sampling mementos from web archive collections. Web archive collections consist of thousands of documents, and humans need tools to intelligently select mementos for a given purpose. Hypercane's goal is to supply us with a list of memento URI-Ms derived from the input we provide. In Part 2 , I highlighted how Hypercane's synthesize action converts its input into other formats like JSON for Raintale stories, WARCs for Archives Unleashed Toolkit , or boilerplate-free files for Gensim . This post focuses on the primitive advanced actions that make up Hypercane's sampling algorithms. We can mix and match different primitives to arrive at the sample that best meets our needs. The DSA project 's goal is to summarize a web archive collection by selecting a small number of exemplars and then visualize them with social media stor