Posts

Showing posts from April, 2020

2020-04-30: Archives Unleashed: New York Datathon Report (From Home Edition)

Image
The Archives Unleashed Datathon is a two-day event hosted by the Archive Unleashed team where participants from different research backgrounds collaborate together to explore web archive collections. The fourth Archives Unleashed datathon partnered with Columbia University Libraries was supposed to happen in New York City. However, as the spread of COVID-19 cases began to increase, the organizers had to make the tough decision of canceling the New York datathon.
Due to the rapidly-evolving COVID-19 situation, we have canceled the datathon which was to be held at Columbia University, March 26-27, 2020.
This decision was not taken lightly and was made with the best interests of our attendees. We have been in touch with all attendees. — The Archives Unleashed Project (@unleasharchives) March 3, 2020

In the same email that brought the news of event cancellation, Ian Milligan also mentioned the possibility of organizing the event online through Zoom and Slack. Within a few weeks, the Arc…

2020-04-26: Large Scale Networking (LSN) Workshop on Huge Data

Image
Between April 13 and 14, 2020, I attended the Large Scale Networking (LSN) workshop on Huge Data. This is a workshop supported by NSF, organized by Clemson University (Dr. Kuang-Ching Wang), University of Virginia (Dr. Ronald Hutchins), and the University of Kentucky (Dr. James Griffioen, and Dr. Zongming Fei). It was supported to be held in Chicago, IL, but due to the coronavirus pandemic, the whole workshop was moved online. The workshop is consists of 4 topic sessions:

Data generation (6 presentations)Data storage (7 presentations)Data movement (14 presentations)Data processing and security (14 presentations) Each speaker is given only 5 minutes to do a flash presentation to highlight their work. The workshop also has 4 breakout sessions:
New Areas of Research Beyond Big DataNew Types of Data & Ways to Get Them Collaboration across Disciplines Critical Research Infrastructure Needed Beyond Big DataDr. C. Lee Giles and I contributed a white paper titled "Scholarly Very Large …

2020-04-25: Effect of Reading Patterns of Novice Researchers using Eye Tracking

Image
Scientific literature gives novel research ideas as well as solutions to various problems. When it comes to scientific literature, reading pattern vary from one person to another. Common reading patterns may exist among researchers having similar expertise in a particular area, novice researchers may have different reading patterns compared to more experienced researchers. We can expect a difference in reading patterns in terms of scan paths and pupillary activity.

The ability to seek information from different sections of research papers determines the reading process of a researchers. Some researchers read the research papers starting from the beginning of the research paper till the end, whereas others read them in a different order than presented. One way to read a research paper is the three-pass approach. Researchers also tend to change their reading patterns over time as they familiarize with the content and structure of research papers.

To explore the eye movements of novice …

2020-04-16: Visual Data Analysis with Streaming-hub

Image
In my previous post, I elaborated on how dataset metadata could be standardized in a manner that enables researchers to efficiently discover and reuse data already collected for past studies. Adopting such a standard brings a host of benefits to research communities – such as simplified data sharing, massively collaborative research, and automated data pre-processing.

However, formulating and adapting such a standard would take years, if not decades, unless 1) the public realizes its practical benefits over the initial hassle of transition, and 2) tools and libraries are built that would ease workflows after transition. My previous post tries to addresses the first concern by introducing DFS and DDU. In this post, I describe our work towards addressing the second concern.

2020-04-09: After Using Eclipse for 10 Years, I Switched to IDEA [Translated]

Original post: https://www.cnblogs.com/ouyida3/p/9901312.html published in December 2018.
Preamble: The original text was in Chinese. I first got the "raw translation" from Google Translate. Here is my impression of Google Translate: 75% or more text made sense but only about 25% text read authentic. As a result, I have to manually edit A LOT to make the post readable. The original post was about 50% longer than what I posted here. After using Eclipse for 10 years, I finally switched to IDEA.

I did not start with Eclipse when I became a Java programmer, but a tool called jBuilder. When I started using this tool, I already found it very easy to use, because previously, I just used a simple text editor.

It didn't take long for me to find a tool called Eclipse, and there was an increasing number of users. At the end of the "test drive", I found it to be very user friendly. The functions inside were just tailored for the programmers. One exciting feature was that it…

2020-04-01: SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

Image
My research focuses on summarizing existing web archive collections through social media storytelling. For this effort, we developed Raintale to tell the stories produced by a selection of mementos. Collections exist at various web archives, like Archive-It and the UK Web Archive. As shown by Klein et al., we can build collections of mementos by conducting focused crawling of web archives. Raintale works well for these cases involving existing mementos, but what if we want to make a story about live web resources, like current events from the news?


WS-DL members have addressed other parts of the problem. Alexander Nwala’s research has centered on finding seeds within search engine result pages (SERPs), social media stories, and news feeds. As part of his news research, Nwala developed StoryGraph, a tool that analyzes multiple news sources every hour and automatically determines the news story or stories that dominate the media landscape at that time. Mohamed Aturban developed Archive…