Posts

2020-06-03: Hypercane Part 1: Intelligent Sampling of Web Archive Collections

Image
This image by NASA is licensed under NASA's Media Usage Guidelines Yasmin AlNoamany experimented with summarizing a web collection by choosing a small number of exemplars and then visualizing them with social media storytelling . This is in contrast to approaches that try to account for all members of the collection. When I took over the  Dark and Stormy Archives project  from her in 2017, the goal was to improve upon her excellent work. Her existing code relied heavily upon the Storify platform to render its stories.  Storify was discontinued  in May 2018. We discovered that  other platforms rendered mementos poorly , so we developed  MementoEmbed  to render individual  surrogates  and later  Raintale  to render whole stories. We discovered that  cards are probably the best surrogate  for stories. We now  publish stores to the DSA-Puddles web site  on a regular basis. Up to this point, we have relied upon sources such as  Nwala's StoryGraph  or human selection

2020-05-28: Richard Pates (Computer Science PhD Student)

Image
     Welcome to my profile on Blogger! My name is  Richard Pates  and I joined the  Web Sciences and Digital Libraries  (WS-DL) research group in the  Department of Computer Science  (CS) at  Old Dominion Univeristy  (ODU) during the Summer of 2020 as a PhD Student in CS advised by  Dr. Jian Wu  as a member of the research team in the  Lab for Applied Machine Learning and Natural Language Processing Systems  (LAMP-SYS) Group working on the  Mining Electronic Theses and Dissertations  (METD) Project. Upon earning the  Masters of Science in Computer Science  (MSCS) from ODU during the Fall of 2018 approval was granted to join the PhD program in CS during the Spring of 2019 jointly advised by  Dr. Ravi Mukkamala  and  Dr. Cong Wong  with an interest in Artificial Intelligence (AI), Cybersecurity and Systems.      This year the main goal in the PhD program for me will be to advance as a  PhD Candidate  during the Fall of 2020 ( Current Academic Calendar ) having made the  Doctoral Dissert

2020-05-22: YouTube's recommended videos get longer as more of them are watched; Most are conspiracy videos.

Image
The video "The NZ Mosque Attack Doesn't Add Up" was recommended from 51 channels In this post, I examine the results of YouTube's recommendation algorithm through an example of series of videos recommended by YouTube. From this example, I found that: The recommended videos are generated to maximize watch time There is significant correlation between videos' metadata and their recommendation order YouTube's recommended videos promote conspiracy theories (in this example) Maximizing watch time is YouTube's ultimate goal YouTube's recommendation algorithm, among other discovery features, focuses on watch time to keep viewers glued to the site. In theory, maximizing engagement benefits YouTube, content creators, and advertisers. It encourages YouTubers to create content that people actually want to watch because it makes them more money from displaying more ads. On the other hand, YouTube makes money from advertisers because they find thei

2020-05-21: Visualizing Webpage Changes Over Time With TMVis

Image
Home page of  tmvis.cs.odu.edu This work has been supported by a  NEH/IMLS Digital Humanities Advancement Grant ( HAA-256368-17 ).  The web is dynamic, meaning webpages that exist today may not exist tomorrow. Even if a webpage continues to exist, it could display completely different content than it used to. Web archives, such as the  Internet Archive  (IA),  Archive-It  (AIT), and  many others , preserve past versions of webpages for use by scholars, researchers, and the general public. Using Memento terminology, an archived version of a webpage at a particular time is called a memento, or URI-M, and the list of all mementos for a particular webpage is called a TimeMap. Different web pages have different sized TimeMaps. For example, the TimeMap for odu.edu contains over 2000 mementos, while the TimeMap for cnn.com contains around 300,000. Analyzing such large TimeMaps is nearly impossible to do manually. Based on previous work ( Alsum and Nelson, ECIR 2014 ),  TimeMap Visu