Posts

2020-06-10: Hypercane Part 2: Synthesizing Output For Other Tools

Image
This image by NOAA is licensed under NOAA's Image Licensing & Usage Info . In Part 1 of this series of blog posts, I introduced Hypercane , a tool for automatically sampling mementos from web archive collections. If a human wishes to create a sample of documents from a web archive collection, they are confronted with thousands of documents from which to choose. Most collections contain insufficient metadata for making decisions. Hypercane's focus is to supply us with a list of memento URI-Ms derived from the input we provide. One of the uses for this sampling is summarization. The previous blog post in this series focused on its high level sample and report actions and how they can be used for storytelling. This post focuses on how to generate output for other tools via Hypercane's synthesize action. The goal of the DSA project : to summarize a web archive collection by selecting a small number of exemplars and then visualize them with social media

2020-06-08: Who is that person in the picture? Or, how Python, and Haar can add value to an image.

Image
(Sung to the tune of "How Much is that Doggie in the Window") Who is that person in the picture? The one with the light brown hair. Who is that person in the picture? I do hope that someone would share. Introduction Often times when a group gets together, for whatever reason, there will be a group picture at the end to commemorate the good times had by all. If this "sea of faces" gets published, in hard or soft copy, there may be a one or two line caption giving the name of the group and perhaps where and when the image was created. Six months or a year later, the image has only marginal value to the people who were there, and almost no value to those who were not there, because it is just a sea of faces. We are interested in finding a low cost (very little human time) method of providing a way to add value to the soft copy of the image, so the image will have greater value later. We have developed a Python script that uses a Haar facial detection cascade to create

2020-06-07: Regular Expression — A Powerful Tool to Parse Text with Visually Identifiable Patterns

Image
In the previous blog , I have discussed how tesseract-OCR performed on scanned Electronic Theses and Dissertations (ETDs). If you have read my earlier blog , we already saw that the process started with converting the cover page of scanned ETDs into images. Then, tesseract-OCR was applied and saved the extracted result into text files. We also saw that OpenCV OCR failed on scanned ETDs. We could try a widely used open-source tool such as  GROBID , designed for scholarly papers. However , this article  shows that GROBID is intended for extracting bibliographic metadata for born-digital academic papers. Finally, we decided to apply tesseract-OCR to extract the text from the cover page of scanned ETDs. Afterward, a series of regular expressions (RegEx) was performed to extract seven metadata fields, including titles, authors, academic-programs, institutions, advisors, and years. In this blog, I will introduce how RegEx can be a powerful tool to quickly parse the text with patterns. 

2020-06-05: Augmented Human Online Trip Report

Image
The 11th Augmented Human International Conference was held on May 27th, and 28th May online. The Augmented Human conference series focuses on scientific contributions on technology for well-being and experience by augmenting human capabilities. The conference series has served as a forum to present and exchange ideas for augmenting human capabilities for 10 years. The conference included keynote speeches, demo presentations, poster presentations, and research presentations. The conference was conducted under 4 main research tracks: Neurosciences, Biomechanics, Technology for healthcare, and smartphones and applications. 11th Augmented Human International Conference Happy to announce that @Huawei is the main sponsor of the 11th Augmented Human International Conference, which has just started online with Prof. Amine Choukou from the University of Manitoba. Joining via https://t.co/LNCEzs65MO @umanitoba #augmented #human #augmentation pic.twitter.com/LfNpilasrC — Augment