2020-04-30: Archives Unleashed: New York Datathon Report (From Home Edition)

The Archives Unleashed Datathon is a two-day event hosted by the Archive Unleashed team where participants from different research backgrounds collaborate together to explore web archive collections. The fourth Archives Unleashed datathon partnered with Columbia University Libraries  was supposed to happen in New York City. However, as the spread of COVID-19 cases began to increase, the organizers had to make the tough decision of canceling the New York datathon. Due to the rapidly-evolving COVID-19 situation, we have canceled the datathon which was to be held at Columbia University, March 26-27, 2020. This decision was not taken lightly and was made with the best interests of our attendees. We have been in touch with all attendees. — The Archives Unleashed Project (@unleasharchives) March 3, 2020 In the same email that brought the news of event cancellation, Ian Milligan also mentioned the possibility of organizing the event online through Zoom and Slac

2020-04-26: Large Scale Networking (LSN) Workshop on Huge Data

Between April 13 and 14, 2020, I attended the Large Scale Networking (LSN) workshop on Huge Data. This is a workshop supported by NSF, organized by Clemson University ( Dr. Kuang-Ching Wang ), University of Virginia ( Dr. Ronald Hutchins ), and the University of Kentucky ( Dr. James Griffioen , and Dr. Zongming Fei ). It was supported to be held in Chicago, IL, but due to the coronavirus pandemic, the whole workshop was moved online. The workshop is consists of 4 topic sessions: Data generation (6 presentations) Data storage (7 presentations) Data movement (14 presentations) Data processing and security (14 presentations) Each speaker is given only 5 minutes to do a flash presentation to highlight their work. The workshop also has 4 breakout sessions: New Areas of Research Beyond Big Data New Types of Data & Ways to Get Them  Collaboration across Disciplines  Critical Research Infrastructure Needed Beyond Big Data Dr. C. Lee Giles and I contributed a white paper tit

2020-04-25: Effect of Reading Patterns of Novice Researchers using Eye Tracking

Figure 1: A participant reading the research paper  wearing the PupilLabs Core eye tracker.  Scientific literature gives novel research ideas as well as solutions to various problems. When it comes to scientific literature, reading pattern vary from one person to another. Common reading patterns may exist among researchers having similar expertise in a particular area, novice researchers may have different reading patterns compared to more experienced researchers. We can expect a difference in reading patterns in terms of scan paths and pupillary activity. The ability to seek information from different sections of research papers determines the reading process of a researchers. Some researchers read the research papers starting from the beginning of the research paper till the end, whereas others read them in a different order than presented. One way to read a research paper is the  three-pass approach . Researchers also tend to change their reading patterns over time as they f

2020-04-16: Visual Data Analysis with Streaming-hub

Streaming-hub [ Link ] In my  previous post , I elaborated on how dataset metadata could be standardized in a manner that enables researchers to efficiently discover and reuse data already collected for past studies. Adopting such a standard brings a host of benefits to research communities – such as simplified data sharing, massively collaborative research, and automated data pre-processing. However, formulating and adapting such a standard would take years, if not decades, unless 1) the public realizes its practical benefits over the initial hassle of transition, and 2) tools and libraries are built that would ease workflows after transition. My previous post tries to addresses the first concern by introducing DFS and DDU. In this post, I describe our work towards addressing the second concern.

2020-04-09: After Using Eclipse for 10 Years, I Switched to IDEA [Translated]

Original post:  published in December 2018. Preamble : The original text was in Chinese. I first got the "raw translation" from Google Translate. Here is my impression of Google Translate: 75% or more text made sense but only about 25% text read authentic. As a result, I have to manually edit A LOT to make the post readable. The original post was about 50% longer than what I posted here. After using Eclipse for 10 years, I finally switched to IDEA . I did not start with Eclipse when I became a Java programmer, but a tool called jBuilder. When I started using this tool, I already found it very easy to use, because previously, I just used a simple text editor. It didn't take long for me to find a tool called Eclipse, and there was an increasing number of users. At the end of the "test drive", I found it to be very user friendly. The functions inside were just tailored for the programmers. One exciting feature

2020-04-01: SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

This screenshot from DSA Puddles demonstrates  the story produced by the SHARI process for the  largest StoryGraph component on March 23, 2020 . Here we see news stories discussing the COVID-19 pandemic on that day. My research focuses on summarizing existing web archive collections through social media storytelling. For this effort, we developed Raintale to tell the stories produced by a selection of mementos. Collections exist at various web archives, like Archive-It and the UK Web Archive . As shown by Klein et al. , we can build collections of mementos by conducting focused crawling of web archives. Raintale works well for these cases involving existing mementos, but what if we want to make a story about live web resources, like current events from the news? Nwala's StoryGraph for March 23, 2020 . Here we see edges connecting the largest connected component - the biggest story of the day. Nwala's  StoryGraph for March 23, 2020 , showing how one can hi