Posts

Showing posts with the label Machine Learning

2020-05-22: YouTube's recommended videos get longer as more of them are watched; Most are conspiracy videos.

Image
The video "The NZ Mosque Attack Doesn't Add Up" was recommended from 51 channels In this post, I examine the results of YouTube's recommendation algorithm through an example of series of videos recommended by YouTube. From this example, I found that: The recommended videos are generated to maximize watch time There is significant correlation between videos' metadata and their recommendation order YouTube's recommended videos promote conspiracy theories (in this example) Maximizing watch time is YouTube's ultimate goal YouTube's recommendation algorithm, among other discovery features, focuses on watch time to keep viewers glued to the site. In theory, maximizing engagement benefits YouTube, content creators, and advertisers. It encourages YouTubers to create content that people actually want to watch because it makes them more money from displaying more ads. On the other hand, YouTube makes money from advertisers because they find thei

2019-11-20: PURS 2020 Proposal Awarded to Support Undergraduate Research in Computer Science

Image
I am delighted that my proposal entitled "Toward Knowledge Extraction: Finding Datasets, Methods, and Tools Adopted in Research Papers" is awarded under the Program for Undergraduate Research and Scholarship ( PURS ) by the Old Dominion University Perry Honors College Undergraduate Research Program, in cooperation with the ODU Office of Research. With the increasing volumes of publications in all academic fields, researchers are under great pressure to read and digest research papers that deliver existing and new discoveries, even in niche domains. With the advancement of natural language processing (NLP) techniques in the last decade, it is possible to build frameworks to process free textual content to extract key facts (datasets, methods, and tools) from research papers. The goal of this project is to develop a machine learning framework to automatically extract datasets, methods, and tools from research papers in Computer and Information Science and Engineering (CIS

2019-11-18: The 28th ACM International Conference on Information and Knowledge Management (CIKM)

Image
Students, professors, industry experts, and others came to Beijing to attend the 28th ACM International Conference on Information and Knowledge Management (CIKM) . This was the first time CIKM had accepted a long paper from the Old Dominion University Web Science and Digital Libraries Research Group (WS-DL) and I was happy to represent us at this prestigious conference. CIKM is different from some of our other conference destinations. CIKM's focus spans all forms of information and knowledge, leading to a high diversity in submission topics. The conference organizers classified CIKM papers into topics including advertising, user modeling, urban systems, knowledge graphs, information retrieval, data mining, natural language processing, machine learning, social media, health care, privacy, and security. There were multiple tracks going in five different rooms across three days. There were 202 long papers, 107 short papers, 38 applied research papers, and 26 demos.

2019-10-01: Attending the Machine Learning + Libraries Summit at the Library of Congress

Image
On September 20, 2019, I attended the Machine Learning + Libraries Summit at the Library of Congress. The aim of the meeting is to gather computer and information scientists, engineers, data scientists, and Liberians from reputable universities, government agencies, and industrial companies to come up with ideas on a bunch of questions such as how to expand the service of digital libraries, how to establish a good collaboration with  other groups on machine learning projects, and what factors to consider to design a sustainable machine learning project, especially in the digital library domain. In the initial solicitation, the focus was cultural heritage, but the discussion went far beyond that. The meeting features many interesting lightning talks. Unfortunately, due to the relatively short time allocated, many questions and discussions have to go offline. The organizer also arranged several activities, stimulating brainstorm discussion and teamwork between people from different pl

2018-12-14: New Insight to Big Data: Trip to IEEE Big Data 2018

Image
The IEEE Big Data 2018 was held in the Westin Seattle Hotel between December 10 and December 13, 2018. There are more than 1100 people registered. The accepting rates vary between 13% to 24%, with an average rate of 19%. I have a poster accepted titled “CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset”, co-authored with C. Lee Giles , two of his graduate students ( Bharath and Shaurya ), as well as an undergraduate student who produced preliminary results ( Jianyu Mao ). I attended the conference on Day 2 and Day 3 and left the conference hotel after the keynote on Day 3. Insights from Personal Meetings The most important thing to attend conferences is to meet with old friends and know new friends. Old friends I met include Kyle Williams (Microsoft Bing), Mu Qiao (IBM, chair of I&G track), Yang Song (Google AI, co-chair of I&G track), Manlin Li (Google Cloud), and Madian Khabsa (Apple Siri).  Kyle introduced the recent project on recomme