Posts

Showing posts with the label raw mementos

2020-06-17: Hypercane Part 3: Building Your Own Algorithms

Image
This image by NASA is licensed under NASA's Media Usage Guidelines In Part 1 , we introduced Hypercane , a tool for automatically sampling mementos from web archive collections. Web archive collections consist of thousands of documents, and humans need tools to intelligently select mementos for a given purpose. Hypercane's goal is to supply us with a list of memento URI-Ms derived from the input we provide. In Part 2 , I highlighted how Hypercane's synthesize action converts its input into other formats like JSON for Raintale stories, WARCs for Archives Unleashed Toolkit , or boilerplate-free files for Gensim . This post focuses on the primitive advanced actions that make up Hypercane's sampling algorithms. We can mix and match different primitives to arrive at the sample that best meets our needs. The DSA project 's goal is to summarize a web archive collection by selecting a small number of exemplars and then visualize them with social media stor

2020-06-10: Hypercane Part 2: Synthesizing Output For Other Tools

Image
This image by NOAA is licensed under NOAA's Image Licensing & Usage Info . In Part 1 of this series of blog posts, I introduced Hypercane , a tool for automatically sampling mementos from web archive collections. If a human wishes to create a sample of documents from a web archive collection, they are confronted with thousands of documents from which to choose. Most collections contain insufficient metadata for making decisions. Hypercane's focus is to supply us with a list of memento URI-Ms derived from the input we provide. One of the uses for this sampling is summarization. The previous blog post in this series focused on its high level sample and report actions and how they can be used for storytelling. This post focuses on how to generate output for other tools via Hypercane's synthesize action. The goal of the DSA project : to summarize a web archive collection by selecting a small number of exemplars and then visualize them with social media

2020-06-03: Hypercane Part 1: Intelligent Sampling of Web Archive Collections

Image
This image by NASA is licensed under NASA's Media Usage Guidelines Yasmin AlNoamany experimented with summarizing a web collection by choosing a small number of exemplars and then visualizing them with social media storytelling . This is in contrast to approaches that try to account for all members of the collection. When I took over the  Dark and Stormy Archives project  from her in 2017, the goal was to improve upon her excellent work. Her existing code relied heavily upon the Storify platform to render its stories.  Storify was discontinued  in May 2018. We discovered that  other platforms rendered mementos poorly , so we developed  MementoEmbed  to render individual  surrogates  and later  Raintale  to render whole stories. We discovered that  cards are probably the best surrogate  for stories. We now  publish stores to the DSA-Puddles web site  on a regular basis. Up to this point, we have relied upon sources such as  Nwala's StoryGraph  or human selection