2018-07-18: HyperText and Social Media (HT) Trip Report

Leaping Tiger statue next to the College of Arts at Towson University
From July 9 - 12, the 2018 ACM Conference on Hypertext and Social Media (HT) took place at the College of Arts at Towson University in Baltimore, Maryland. Researchers from around the world presented the results of complete or ongoing work in tutorial, poster, and paper sessions. Also, during the conference I had the opportunity to present a full paper: "Bootstrapping Web Archive Collections from Social Media" on behalf of co-authors Dr. Michele Weigle and Dr. Michael Nelson.

Day 1 (July 9, 2018)

The first day of the conference was dedicated to a tutorial (Efficient Auto-generation of Taxonomies for Structured Knowledge Discovery and Organization) and three workshops:
  1. Human Factors in Hypertext (HUMAN)
  2. Opinion Mining, Summarization and Diversification
  3. Narrative and Hypertext
I attended the Opinion Mining, Summarization and Diversification workshop. The workshop started with a talk titled: "On Reviews, Ratings and Collaborative Filtering," presented by Dr. Oren Sar Shalom, principal data scientist at Intuit, Israel. Next, Ophélie Fraisier, a PhD student studying stance analysis on social media at Paul Sabatier University, France, presented: "Politics on Twitter : A Panorama," in which she surveyed methods of analyzing tweets to study and detect polarization and stances, as well as election prediction and political engagement.
Next, Jaishree Ranganathan, a PhD student at the University of North Carolina, Charlotte, presented: "Automatic Detection of Emotions in Twitter Data - A Scalable Decision Tree Classification Method."
Finally, Amin Salehi, a PhD student at Arizona State University, presented: "From Individual Opinion Mining to Collective Opinion Mining." He showed how collective opinion mining can help capture the drivers behind opinions as opposed to individual opinion mining (or sentiment) which identifies single individual attitudes toward an item.

Day 2 (July 10, 2018)

The conference officially began on day 2 with a keynote: "Lessons in Search Data" by Dr. Seth Stephens-Davidowitz, a data scientist and NYT bestselling author of: "Everybody Lies."
In his keynote, Dr. Stephens-Davidowitz revealed insights gained from search data ranging from racism to child abuse. He also discussed a phenomenon in which people are likely to lie to pollsters (social desirability bias) but are honest to Google ("Digital Truth Serum") because Google incentivizes telling the truth. The paper sessions followed the keynote with two full papers and a short paper presentation.

The first (full) paper of day 2 in the Computational Social Science session: "Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora," was presented by Shubhanshu Mishra, a PhD student at the iSchool of the University of Illinois at Urbana-Champaign. He showed correlations between user-level and tweet-level metadata by addressing two questions: "Do tweets from users with similar Twitter characteristics have similar sentiments?" and "What meta-data features of tweets and users correlate with tweet sentiment?" 
Next, Dr. Fred Morstatter presented a full paper: "Mining and Forecasting Career Trajectories of Music Artists," in which he showed that their dataset generated from concert discovery platforms can be used to predict important career milestones (e.g., signing by a major music label) of musicians.
Next, Dr. Nikolaos Aletras, a research associate at the University College London, Media Futures Group, presented a short paper: "Predicting Twitter User Socioeconomic Attributes with Network and Language Information." He described a method of predicting the occupational class and income of Twitter users by using information extracted from their extended networks.
After a break, the Machine Learning session began with a full paper (Best Paper Runner-Up): "Joint Distributed Representation of Text and Structure of Semi-Structured Documents," presented by Samiulla Shaikh, a software engineer and researcher at IBM India Research Labs.
Next, Dr. Oren Sar Shalom presented a short paper titled: "As Stable As You Are: Re-ranking Search Results using Query-Drift Analysis," in which he presented the merits of using query-drift analysis for search re-ranking. This was followed by a short paper presentation titled: "Embedding Networks with Edge Attributes," by Palash Goyal, a PhD student at University of Southern California. In his presentation, he showed a new approach to learn node embeddings that uses the edges and associated labels.
Another short paper presentation (Recommendation System session) by Dr. Oren Sar Shalom followed. It was titled: "A Collaborative Filtering Method for Handling Diverse and Repetitive User-Item Interactions." He presented a collaborative filtering model that captures multiple complex user-item interactions without any prior domain knowledge.
Next, Ashwini Tonge, a PhD student at Kansas State University presented a short paper titled: "Privacy-Aware Tag Recommendation for Image Sharing," in which she presented a means of tagging images on social media in order to improve the quality of user annotations while preserving user privacy sharing patterns.
Finally, Palash Goyal presented another short paper titled: "Recommending Teammates with Deep Neural Networks."

The day 2 closing keynote by Leslie Sage, director of data science at DevResults followed after a break that featured a brief screening of the 2018 World Cup semi-final game between France and Belgium. In her keynote, she presented the challenges experienced in the application of big data toward international development.

Day 3 (July 11, 2018)

Day 3 of the conference began with a keynote: "Insecure Machine Learning Systems and Their Impact on the Web" by Dr. Ben Zhao, Neubauer Professor of Computer Science at University of Chicago. He highlighted many milestones of machine learning by showing problems they have solved in natural language processing and computer vision. But showed that opaque machine learning systems are vulnerable to attack by agents with malicious intents, and he expressed the idea that these critical issues must be addressed especially given the rush to deploy machine learning systems. 
Following the keynote, I present our full paper: "Bootstrapping Web Archive Collections from Social Media" in the Temporal session. I highlighted the importance of web archive collections as a means of preserving the historical record of important events, and the seeds (URLs) from which they are formed. The seeds are collected by experts curators, but we do not have enough experts to collect seeds in a world of rapidly unfolding events. Consequently, I proposed exploiting the collective domain expertise of web users by generating seeds from social media collections and showed through a novel suite of measures, that seeds generated from social media are similar to those generated by experts.

Next, Paul Mousset, a PhD student at Paul Sabatier University, presented a full paper: "Studying the Spatio-Temporal Dynamics of Small-Scale Events in Twitter," in which he presented his work into the granular identification and characterization of event types on Twitter.
Next, Dr. Nuno Moniz, invited Professor at the Sciences College of the University of Porto, presented a short paper: "The Utility Problem of Web Content Popularity Prediction." He demonstrated that state-of-the-art approaches for predicting web content popularity have been optimized for improving the predictability of average behavior of data: items with low levels of popularity.
Next, Samiulla Shaikh (again), presented the first full paper (Nelson Newcomer Award winner) of the Semantic session: "Know Thy Neighbors, and More! Studying the Role of Context in Entity Recommendation," in which he showed how to efficiently explore a knowledge graph for the purpose of entity recommendation by utilizing contextual information to help in the selection of a subset of a entities in a knowledge graph.
Samiulla Shaikh (again), presented a short paper: "Content Driven Enrichment of Formal Text using Concept Definitions and Applications," in which he showed a method of making formal text more readable to non-expert users by text enrichment e.g., highlighting definitions and fetching of definitions from external data sources.
Next, Yihan Lu, a PhD student at Arizona State University, presented a short paper: "Modeling Semantics Between Programming Codes and Annotations." He presented the results from investigating a systematic method to examine annotation semantics and its relationship with source codes. He also showed their model which predict concepts in programming code annotation. Such annotations could be useful to new programmers.
Following a break, the User Behavior session began. Dr. Tarmo Robal, a research scientist at the Tallinn University of Technology, Estonia, presented a full paper: "IntelliEye: Enhancing MOOC Learners' Video Watching Experience with Real-Time Attention Tracking." He introduced IntelliEye, a system that monitors students watching video lessons and detects when they are distracted and intervenes in an attempt to refocus their attention.
Next, Dr. Ujwal Gadiraju, a postdoctoral researcher at L3S Research Center, Germany, presented a full paper: "SimilarHITs: Revealing the Role of Task Similarity in Microtask Crowdsourcing." He presented his findings from investigating the role of task similarity in microtask crowdsourcing on platforms such as Amazon Mechanical Turk and its effect on market dynamics.
Next, Xinyi Zhang, a computer science PhD candidate at UC Santa Barbara, presented a short paper: "Penny Auctions are Predictable: Predicting and profiling user behavior on DealDash." She showed that penny auction sites such as DealDash are vulnerable to modeling and adversarial attacks by showing that both the timing and source of bids are highly predictable and users can be easily classified into groups based on their bidding behaviors.
Shortly after another break, the Hypertext paper sessions began. Dr. Charlie Hargood, senior lecturer at Bournemouth University, UK and Dr. David Millard, associate Professor  at the University of Southampton, UK, presented a full paper: "The StoryPlaces Platform: Building a Web-Based Locative Hypertext System." They presented StoryPlaces, an open source authoring tool designed for the creation of locative hypertext systems.
Next, Sharath Srivatsa, a Masters student at International Institute of Information Technology, India, presented a full paper: "Narrative Plot Comparison Based on a Bag-of-actors Document Model." He presented an abstract "bag-of-actors" document model for indexing, retrieving, and comparing documents based on their narrative structures. The model resolves the actors in the plot and their corresponding actions.
Next, Dr. Claus Atzenbeck, professor at Hof University, Germany, presented a short paper: "Mother: An Integrated Approach to Hypertext Domains." He stated that the Dexter Hypertext Reference Model which was developed to provide a generic model for node-link hypertext systems does not match the need of Component-Based Open Hypermedia Systems (CB-OHS), and proposed how this can be remedied by introducing Mother, a system that implements link support.
The final (short) paper of the day, "VAnnotatoR: A Framework for Generating Multimodal Hypertexts," was presented by Giuseppe Abrami. He introduced a virtual reality and augmented reality framework for generating multimodal hypertexts called VAnnotatoR. The framework enables the annotation and linkage of texts, images and their segments with walk-on-able animations of places and buildings.
The conference banquet at Rusty Scupper followed the last paper presentation. The next HyperText conference was announced at the banquet.

Day 4 (July 12, 2018)

The final day of the conference featured multiple papers presentations such as:
The day began with a keynote "The US National Library of Medicine: A Platform for Biomedical Discovery and Data-Powered Health," presented by Elizabeth Kittrie, strategic advisor for data and open science at the National Library of Medicine (NLM). She discussed the role the NLM serves such as provider of health data for biomedical research and discovery. She also discussed the challenges that arise from the rapid growth of biomedical data, shifting paradigms of data sharing, as well as the role of libraries in providing access to digital health information.
The Privacy session of exclusively full papers followed the keynote. Ghazaleh Beigi, a PhD student at Arizona State University presented: "Securing Social Media User Data - An Adversarial Approach." She showed a privacy vulnerability that arises from the anonymization of social media data by demonstrating an adversarial attack specialized for social media data.
Next, Mizanur Rahman, a PhD student at Florida International University, presented: "Search Rank Fraud De-Anonymization in Online Systems." The bots and automatic methods session with two full paper presentations followed.
Diego Perna, a researcher at the University of Calabria, Italy, presented: "Learning to Rank Social Bots." Given recent reports about the use of bots to spread misinformation/disinformation on the web in order to sway public opinion, Diego Perna proposed a machine-learning framework for identifying and ranking online social network accounts based on their degree similarity to bots.
Next, David Smith, a researcher at University of Florida,  presented: "An Approximately Optimal Bot for Non-Submodular Social Reconnaissance." He showed that studies that show how social bots befriend real users as part of an effort to collect sensitive information operate with the premise that the likelihood of users accepting bot friend requests is fixed, a constraint contradicted by empirical evidence. Subsequently, he presented his work which addressed this limitation.
The News session began shortly after a break with a full paper (Best Paper Award) presentation from Lemei Zhang, a PhD candidate from Norwegian University of Science and Technology: "A Deep Joint Network for Session-based News Recommendations with Contextual Augmentation." She highlighted some of the issues news recommendation system suffer such as fast updating rate of news articles and lack of user profiles. Next, she proposed a news recommendation system that combines user click events within sessions and news contextual features to predict the next click behavior of a user.
Next, Lucy Wang, senior data scientist at Buzzfeed, presented a short paper: "Dynamics and Prediction of Clicks on News from Twitter."
Next, Sofiane Abbar, senior software/research engineer at Qatar Computing Research Institute, presented via a YouTube video: "To Post or Not to Post: Using Online Trends to Predict Popularity of Offline Content." He proposed a new approach for predicting the popularity of news articles before  they are published. The approach is based on observations regarding the article similarity and topicality and complements existing content-based methods.

Next, two full papers (Community Detection session) where presented by Ophélie Fraisier and Amin Salehi. Ophélie Fraisier presented: "Stance Classification through Proximity-based Community Detection."  She proposed the Sequential Community-based Stance Detection (SCSD) model for stance (online viewpoints) detection. It is a semi-supervised ensemble algorithm which considers multiple signals that inform stance detection. Next, Amin Salehi presented: "Sentiment-driven Community Profiling and Detection on Social Media." He presented a method of profiling social media communities based on their sentiment toward topics and proposed a method of detecting such communities and identifying motives behind their formation.
For pictures and notes complementary to this blogpost see Shubhanshu Mishra's notes.

I would like to thank the organizers of the conference, the hosts, Towson University College of Arts, as well as IMLS for funding our research.
-- Nwala (@acnwala)