Friday, May 2, 2014

2014-04-14: ECIR 2014 Trip report

From ECIR 2014 official flicker account
Between Apr. 14 to Apr. 16, 2014, in the beautiful Amsterdam city in Netherlands, I attended the the 36th European Conference on Information Retrieval (ECIR 2014). The conference started with Workshops/Tutorials day on Apr 13, which I didn't attend.

The first day was the workshops and tutorials day. ECIR 2014 had a wide range of workshops/tutorials that covered various aspects of IR such as: Text Quantification: A Tutorial, GamifIR' 14 workshop,  Context Aware Retrieval and Recommendation workshop (CaRR 2014), Information Access in smart cities workshop (i-ASC 2014), and Bibliometric-enhanced Information Retrieval workshop (BIR 2014).

The main conference started on April 14 with a welcome note from the conference chair Maarten de Rijke. After that,  Ayse Goker, from Robert Gordon University presented the winner of Karen Spärck Jones award and the keynote speaker Eugene Agichtein, a professor at Emory University. His presentation, which entitled "Inferring Searcher Attention and Intention by Mining Behavior Data", covered the challenges and the opportunities in the IR field and the future research areas.

First, he compared between the challenges of “Search” on 2002, where it aimed to support global information access and the contextual retrieval, and “Search” on 2012 (SWIRL 2012), where it focused on what beyond the ranked list and the evaluation. Eugene moved after that to the concept of inferring the search intention. In this area, Eugene pointed to use the interaction data such as asking questions by understanding the search term in social CQA, and some unsuccessful queries may be converted to automatic questions that are forwarded to the people (CQA) to answer it. Also, he considered the mining the query logs and click logs as sources of data that may enhance the search experience.

Then, Eugene discussed the challenges of having realistic search behavioral data outside the major search engines.  Eugene discussed UFindIt, a game to control the search behavior data at scale. Also, he showed some examples about override the big and expensive eye tracker equipment such as ViewSer that enabled remote eye tracking.

Finally, Eugene listed some of the future trends in IR field such as: behavior models for ubiquitous search, the future vision in search interface by developing an intelligent assistant and augmented reality, developing new tools for  analysis of cognitive processing, using mobile devices with camera as an eye tracking tool, optimizing the power consumption for the search task for mobile devices, and the privacy concern for searching.

After the break were two parallel sessions (Recommendation and Evaluation). I attended the recommendation session,where Chenyi Zhang from Zhejiang University presented his paper entitled "Content + Attributes: a Latent Factor Model for Recommending Scientific Papers in Heterogeneous Academic Networks" . In this paper, they proposed a new enhanced latent model for recommendation system for the academic papers. The system incorporates the paper content (e.g., title and abstract in plain text) and includes additional attributes (e.g., author, venue, publish year). The system solves the cold start for the new user by incorporating social media.  In the evaluation session, Colin Wilkie, from University of Glasgow, presented Best and Fairest: An Empirical Analysis of Retrieval System Bias. After lunch, we had the first poster/demo session. There was a set of interesting demos: DAIKnow, Khresmoi Professional, and ORMA.

The second day, April 14, started with a panel discussion about "Panel on the Information Retrieval Research Ecosystem" but due to the jet lag, I couldn't attend the morning session. After lunch, we started the next poster/demo session. I enjoyed the discussion around, GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries and TripBuilder who won the best demo award.

In the third and last day, April 15, the keynote speaker was Gilad Mishne, Director of Search at Twitter. Gilad introduced Twitter search as building the train track while the train is running hundreds of miles an hour. Gilad discussed the challenges of the search task in Twitter. He defined the challenges to be: mainstream input of tweets, on-time indexing, ranking tweets, and aggregating the results between tweets and people that required multiple indexes and multiple ranking techniques. Also, he distinguished the behavior in twitter search from search engines, as it is not repeated, 29% of top queries on twitter change hourly and 44% change daily. Gilad explained that there is a human in the loop for tweet annotation, Twitter hires "on-call" crowdsourced workers to categorize the queries, for example to determine if it is news-related or not. There are  a set of IR techniques that will not work with twitter search such as: anchor text,  term frequency, click data,  and relevance judgments. Twitter results optimization targets decreasing the bad results, which will increase good search experience, using evaluation metric so-called cr@p3 (fraction of crap in the top 3 docs).

The next session was "Digital Library" session where I presented my paper "Thumbnail Summarization for Web Archives". In this paper, we proposed various techniques to predict the change in the web page visual appearance based on the change of the HTML text in order to select a subset of the TimeMap that represents the major changes of the website through time. We suggested using SimHash fingerprint to estimate the changes between the pages. We proposed three algorithms that may minimize the size of the TimeMap to 25%.

The next presentation was "CiteSeerX: A Scholarly Big Dataset" by Cornelia Caragea. She spoke about some use cases for Scholarly article databases. Cagalna used DBLP content to clean the CiteXseer database.  She assumed that if there are two articles similar in title, author, and number of pages, then they are duplicate. However, one of the audience discussed a special use-case in the medical publications where this assumption is not right.

Then, Marijn Koolen from University of Amsterdam presented User Reviews in the Search Index? That'll Never Work!. Marjjn studied the user reviews for books on the web, e.g., Amazon, to enhance the search experience for books. He showed different examples about useful and unuseful comments. He used a big dataset of 2.8 million books description collected from Amazon and LT, augmented by 1.8 M entries from LoC and BL. The industry track ran in parallel with my session, this is an interesting slides from Alessandro Benedetti, Zaizi UK.

After lunch, I attended the industry track session with a presentation about the global search engines. Pavel Seryukov from Yandex presented "Analyzing Behavioral Data for Improving Search Experience at Yandex". Pavel spoke about Yandex efforts to share user data. Yandex ran click data challenge for 3 years right now. He showed how they anonymized the click logs by converting it into numbers.

The next presenter was Peter Mika from Yahoo Labs. His presentation entitled "Semantic Search at Yahoo". In this presentation, Peter gave an overview about the status of the semantic web and how it is used by the search engines.

By the end of the day, it was the closing session where the conference chair thanked the organizer for their efforts. Also, ECIR 2015 committee promoted the next ECIR event at Vienna, Austria. Finally, ECIR 2014 media committee made this wonderful video that incorporated various moments from ECIR 2014.

Ahmed AlSum

No comments:

Post a Comment