Posts

2020-05-28: Richard Pates (Computer Science PhD Student)

Image
Welcome to my profile on Blogger! My name is Richard Pates and I joined the Web Sciences and Digital Libraries (WS-DL) research group in the Department of Computer Science (CS) at Old Dominion Univeristy (ODU) during the Summer of 2020 as a PhD Student in CS advised by Dr. Jian Wu as a member of the research team in the Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS) Group working on the Mining Electronic Theses and Dissertations (METD) Project. Upon earning the Masters of Science in Computer Science (MSCS) from ODU during the Fall of 2018 approval was granted to join the PhD program in CS during the Spring of 2019 jointly advised by Dr. Ravi Mukkamala and Dr. Cong Wong with an interest in Artificial Intelligence (AI), Cybersecurity and Systems.      This year the main goal in the PhD program for me will be to advance as a PhD Candidate during the Fall of 2020 (Current Academic Calendar) having made the Doctoral Dissertation Committee selection …

2020-05-22: YouTube's recommended videos get longer as more of them are watched; Most are conspiracy videos.

Image
In this post, I examine the results of YouTube's recommendation algorithm through an example of series of videos recommended by YouTube. From this example, I found that:

The recommended videos are generated to maximize watch timeThere is significant correlation between videos' metadata and their recommendation orderYouTube's recommended videos promote conspiracy theories (in this example) Maximizing watch time is YouTube's ultimate goalYouTube's recommendation algorithm, among other discovery features, focuses on watch time to keep viewers glued to the site. In theory, maximizing engagement benefits YouTube, content creators, and advertisers. It encourages YouTubers to create content that people actually want to watch because it makes them more money from displaying more ads. On the other hand, YouTube makes money from advertisers because they find their YouTube's advertising campaigns responsive so they advertise more. In order to sustain this win-win situatio…

2020-05-21: Visualizing Webpage Changes Over Time With TMVis

Image
This work has been supported by a NEH/IMLS Digital Humanities Advancement Grant (HAA-256368-17). 

The web is dynamic, meaning webpages that exist today may not exist tomorrow. Even if a webpage continues to exist, it could display completely different content than it used to. Web archives, such as the Internet Archive (IA), Archive-It (AIT), and many others, preserve past versions of webpages for use by scholars, researchers, and the general public. Using Memento terminology, an archived version of a webpage at a particular time is called a memento, or URI-M, and the list of all mementos for a particular webpage is called a TimeMap. Different web pages have different sized TimeMaps. For example, the TimeMap for odu.edu contains over 2000 mementos, while the TimeMap for cnn.com contains around 300,000. Analyzing such large TimeMaps is nearly impossible to do manually.

Based on previous work (Alsum and Nelson, ECIR 2014), TimeMap Visualization (TMVis) determines which mementos show sign…

2020-05-19: OCR Tools Experiment on Scanned Electronic Theses and Dissertations (ETDs)

Image
A thesis or dissertation is one type of scholarly work that shows a student pursuing higher education and has successfully met the partial requirement of a degree. An electronic thesis or dissertation can be found from either a university's electronic theses and dissertations (ETDs) digital library or ProQuest (a third party ETD repository). ETDs contain lots of rich metadata that can be used for searching ETDs from the repository. However, not all ETD metadata are available. Therefore, it is necessary to extract metadata from scholarly ETDs. Also, extracting metadata could be challenging, mainly when it is found as scanned academic ETDs. Although many open-source tools exhibit satisfying performance in certain types of documents, experiments indicate that they tend to produce unacceptable errors or fail on scanned ETDs. In this blog post, I introduce one of the widely used optical character recognition (OCR) tools called tesseract-OCR and show how tesseract-OCR performs on scanne…

2020-05-06: PTSD Assessments in COVID-19 Health Care Workers

Image
Health care workers are working in unfamiliar territory in recent times. Hospitals in major cities are overwhelmed by the number of patients they are handling as a result of the coronavirus disease 2019 (COVID-19) pandemic. There are accounts of people dying in the hospital hallways before help can arrive due to an insufficient amount of space, equipment and staff to handle the influx of patients. Hospital morgues are overflowing. To make matters worse, doctors and nurses have to worry about exposure to COVID-19 and/or possibly exposing their families largely due to a lack of personnel protective equipment (PPE).

The current environment is putting health care works at greater risk of developing Post-Traumatic Stress Disorder (PTSD). As a matter of fact, hospital personnel have started to report symptoms consistent with those suffering with PTSD from sleep disturbances to constant worry and paranoia. There have even been reports of suicide among first responders and emergency r…

2020-05-06: Teaching a Flipped Hybrid (In-Class/Online) Course

Image
I’ve been meaning to write this for a couple years. Now seems an especially appropriate time for it. In particular, a hybrid course may be an option if staggered in-class attendance is something that will be implemented in the Fall.

My first hybrid class began as an in-class "flipped" model.  So first, I'll talk about how I implemented the flipped mode and then I'll discuss how I handled the hybrid (in-class and online) aspects the following year.

My definition of a "flipped" class (see https://en.wikipedia.org/wiki/Flipped_classroom, http://flippedclass.com/whyteachersmattermoreinflippedclassroom/, http://facultyinnovate.utexas.edu/teaching/flipping-a-class) is one in which students actually do the reading before the class meeting, and the class meeting time is spent discussing the material with students (not lecturing) and doing in-class activities. There can be several benefits to this, including that class time is changed from content delivery to active l…

2020-04-30: Archives Unleashed: New York Datathon Report (From Home Edition)

Image
The Archives Unleashed Datathon is a two-day event hosted by the Archive Unleashed team where participants from different research backgrounds collaborate together to explore web archive collections. The fourth Archives Unleashed datathon partnered with Columbia University Libraries was supposed to happen in New York City. However, as the spread of COVID-19 cases began to increase, the organizers had to make the tough decision of canceling the New York datathon.
Due to the rapidly-evolving COVID-19 situation, we have canceled the datathon which was to be held at Columbia University, March 26-27, 2020.
This decision was not taken lightly and was made with the best interests of our attendees. We have been in touch with all attendees. — The Archives Unleashed Project (@unleasharchives) March 3, 2020

In the same email that brought the news of event cancellation, Ian Milligan also mentioned the possibility of organizing the event online through Zoom and Slack. Within a few weeks, the Arc…