2024-08-30: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024 Trip Report
ACM SIGIR 2024 was held in Washington, D.C from July 14 - 18. There were over 900 attendees! The conference also featured two themed days: LLM Day and Government Day. Parallel topic sessions included presentations for full papers, resource papers, and invited Transactions on Information Systems (TOIS) journal papers. Short papers were presented as posters altogether in one session. Dr. Nelson (@phonedude_mln) and I (@lesley_elis) both attended SIGIR 2024.
Opening and Keynote: Gerald Salton Award
The conference opened on Monday, with the conference chairs presenting about registration and the program chairs presenting about how papers were selected. The review process used an automated system to match reviewers to papers based on information from DBLP. This also was used to avoid conflicts, to the extent that reviewers and authors had different countries. Next, the opening keynote was reserved for the Gerald Salton Award winner. Congratulations to Ellen Vorhees for being named as the 2024 award recipient.
Ellen’s keynote speech traced how evaluation of information retrieval systems has evolved over time. She explained the motivations for developing TREC, and showed multiple evaluation metrics for many of the datasets. A theme throughout Ellen’s talk was that she was not predispositioned to specific results, but she did always pursue discovery of the reasons behind the results.
Session 1: Evaluation
The first parallel session I attended was fittingly on evaluation, and included the presentation of seven papers, including three full papers, one perspective paper, and three resource papers. Matteo Corsi (@corsi_mat) presented his paper “The Treatment of Ties in Rank-Biased Overlap,” which used Kendall’s Tau as inspiration to extend Rank Biased Overlap for ties that represent uncertainty. They developed two extensions to RBO: one to break ties randomly rather than deterministically, and the other to correct RB for the presence of ties. Next, Mark Sanderson (@IR_oldie) presented his paper “Uncontextualized significance considered dangerous.” The authors grouped TREC runs by participant group and showed that one possible reason for significance disagreements for groups compared to individual runs in topics could be Type I errors. An explanation for the presence of the errors is publication bias. The next paper presented was “Can we trust Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance” by Theresia Rampisela. Theresia showed that fairness judgments and relevance judgments are separate measures in recommendation systems. New measures combined these two topics into one measure, and Theresia’s work evaluates these new joint measures. She found that the measures are neither expressive nor granular, and the measures do not correlate with each other either, leading her to advise caution in their use.
.@IR_oldie now presenting the work he did together with @frrncl when visiting @RMITComputing and @AdmsCentre on the need of contextualising significance tests in offline IR evaluation #sigir2024 pic.twitter.com/cSZl0d6J7Q
— Damiano Spina (@damiano10) July 15, 2024
Nick Craswell (@nick_craswell) was the next to present, and he presented his perspective paper, “What Matters in a Measure? A Perspective from Large-Scale Search Evaluation.” The paper detailed concerns in evaluation from the industry perspective. One example concern was when metrics are mutated into goals, inadvertently turning a secondary measure of a goal into the priority. Following were presentations of three resource papers. “CIRAL: A Test Collection for CLIR Evaluations in African Languages” was presented by Chrystina Xinyu Zhang (@crystina_z). This paper was nominated for a best paper award. The test collection includes four languages: Hausa, Somali, Swahili, and Yoruba. The test collection also includes human annotated queries and relevance judgments. The next resource paper, “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries,” was presented by Qiaosheng Chen. ACORDAR 2.0 extends ACORDAR 1.0 to be dense and to include question style queries. The final resource paper, “Browsing and Searching Metadata of TREC,” was presented by Timo Breuer. The implementation includes a publicly available online metadata browser and an API for JSON results.
Event: DEI Lunch
The lunch session on Monday focused on diversity, equity, and inclusion, and included a panel featuring Bhaskar Mitra (@UnderdogGeek), Clemencia Siro, Vanessa Murdock (@vanessa_murdock), Doug Oard, and Nicola Ferro (@frrncl). After a land acknowledgement, the panelists brought forth ideas to improve the diversity of attendees at the conference. Some of these ideas included adding a findings track similar to EMNLP, creation of affinity groups similar to NeurIPS, and training future leaders in their home environments. The lunch itself was a hot buffet with lots of options.
@suzan kicking off the DEI lunch at #sigir2024. The food is excellent, but the cause is even more important! @SIGIRConf pic.twitter.com/MBa1wklK2k
— Ingo Frommholz @ingo@idf.social (@iFromm) July 15, 2024
Keynote 2: Representation Learning and Information Retrieval
Yimimg Yang gave the afternoon keynote. She explained how Representation Learning has changed information retrieval in recent years. She discussed improvements in document training, RAG in IR, and ideas to push the limits of what is possible in representation learning for IR.
Session 2: SIRIP Domain Specific
For the second parallel session, I attended the SIGIR Symposium on IR in Practice (SIRIP) domain specific session, as I ate lunch with the fourth presenter Johny Moreira. First up was Shiri Dori-Hacohen (@shirki) presenting her paper, “Misinformation Mitigation Praxis: Lessons Learned and Future Directions from Co·Insights” about misinformation specifically targeting Asian American and Pacific Islanders (AAPI) users. Shiri’s work covers three years following an NSF grant, and includes many outcomes, one of which is an explorer for fact checking misinformation from user donated WhatsApp data. Next, Sebastian Björkqvist presented his work, “Relevance Feedback Method For Patent Searching Using Vector Subspaces.” He demonstrated his method to improve recall for finding novelty destroying documents before submitting a new patent. Finally, Johny Moreira presented his work, “A Study on Unsupervised Question and Answer Generation for Legal Information Retrieval and Precedents Understanding.” He showed how clustering along with summary by an LLM could be used to group differing legal opinions and applied the technique to Brazilian case decisions.
Now @ShirKi is presenting her paper “Misinformation Mitigation Praxis: Lessons Learned and Future Directions from Co·Insights” #sigir2024 pic.twitter.com/4gSXa7PjFf
— Lesley Frew (@lesley_elis) July 15, 2024
Session 3: Users and Simulations
For the third parallel session on Monday, I attended the users and simulations session. Three full papers were able to be presented in person and there was also one TOIS paper. Kaixin Ji presented her paper, “Characterizing Information Seeking Processes with Multiple Physiological Signals.” This paper was a very interesting mix of information seeking and human computer interaction, using physiological evidence to support segmentation of distinct phases in the information seeking process. Her department (RMIT) was extremely supportive and proud of her strong work, and I hope to read more from her in the future.
10 years ago I was giving my first SIGIR full paper presentation at #sigir2014
— Damiano Spina (@damiano10) July 15, 2024
.@kaixin_ji just gave an amazing talk (running out of the long 9’ she allocated for questions) presenting her first #sigir2024 full paper! ✨
Lots of thought provoking questions!#proudPhDadvisors pic.twitter.com/LkAISFCbz9
Next, Zhongxiang Sun presented his paper, “To Search or to Recommend: Predicting Open-App Motivation with Neural Hawkes Process.” Kuaishou, a short video social media platform in China, was a sponsor of SIGIR this year, and they were an industry partner on this paper. Kuaishou supports both search as well as recommendation, and this paper developed a process to predict which activity users would engage in based on their past browsing history. The next paper presented by Teng Shi, “UniSAR: Modeling User Transition Behaviors between Search and Recommendation,” was also written with Kuaishou as an industry partner. This paper proposed a framework for modeling transitions in joint search and recommendation systems. The result of the framework is an improvement in the performance of both the search and recommendation systems. Finally, Ahmed Abbasi presented his TOIS paper, “Examining User Heterogeneity in Digital Experiments.” The paper creates a framework to identify subgroups whose outcomes are statistically significant compared to the group as a whole. They showed that this phenomenon is actually common when grouping users by demographics but also traits like satisfaction.
Event: Women in IR
The Women in IR event followed the third parallel session. The room was full of women and allies, and there was a delicious charcuterie spread. After a brief history of the event, some of the women gave lightning introductions. SIGIR chair Vanessa Murdock (@vanessa_murdock) then led a Q&A. She addressed questions on how to cope with the lack of female role models and how to identify gender bias in the context of negative feedback.
Now is a good time to follow- and refresh- my ancient list #WomenInIR (&femmes). Please reply to be added (and/or tag others who should be added)!https://t.co/lnIrZKgxm5#SIGIR2024 @SIGIRConf #BIAS2024
— Shiri Dori-Hacohen♿️🧠✡️ (📈🥄s). 🟣🎗️🏴 (@ShirKi) July 18, 2024
SIGIR LLM day took place on Tuesday. It included a variety of talks on how LLMs are impacting IR in terms of recommendation systems, industry, domain specific, and search trends.
Keynote: Towards Steerable AI Systems
Thorsten Joachims gave the LLM day keynote on Steerable AI Systems. He talked about how coactive learning can transfer personalization to recommendations generated by LLMs and how LLM policies can be used to bridge the gap between macro and micro system goals.
Session 1: CTR, Ads & Click Models
Event: Student Lunch
Students and members of SIGIR leadership sat together at tables at the student lunch to get to know each other and ask questions about careers in IR. My table was very diverse geographically and included me, 2 students from South Korea, 1 student from Germany, and two international students studying in the US. The students were joined by Behrooz Mansouri, Guido Zuccon, and Ryen White. It turned out to be a small world:
Behrooz Mansouri (@behrouzmansoury) has worked with WSDL’s own Dr. Wu.
One US-based student, Chiman Salavati, is supervised by Shiri Dori-Hacohen who I had met the previous day.
The other US-based recently graduated student, Dan Luo, was supervised by Brian Davison (@BrianDavison). Dr. Davison has conducted research with the Internet Archive. We did not make this connection until Brian, Johny, Behrooz and I met up to walk to the banquet and were waiting for our advisors, but when the two advisors joined us they clearly already knew each other..
At our table, we mostly talked about academic versus industry careers. The lunch was another high quality hot buffet with many options.
Session 3: Fairness
I had already planned to attend the fairness session, but solidified my plans after I had lunch with one of the presenters, Dan Luo. Fairness has an interesting definition in IR, in that it might mean fair exposure of companies when presenting products to consumers, or it might mean a lack of discrimination when presenting applicants to HR. One thing that surprised me during this session was that two of the papers used the MovieLens dataset for evaluation of their models. I was first introduced to this dataset in CS 532 Web Science through the Programming Collective Intelligence book, and one of the things that I focused on in my report that week was about how biased the 100k dataset was for the recommendation system built. I am glad to learn that this dataset is a standard for evaluating bias in models.
This session included 5 full papers, a resource paper, and a journal paper. First up to present was Chen Xu with “A Taxation Perspective for Fair Re-ranking.” This paper integrated economic taxation principles with IR, aligning different user environments with the ideas of prosperity and downturn. They showed that their model had more consistent exposure than other models when subjected to different fairness parameters. Next, Thomas Jaenich (@tjaenich) presented his paper, “Fairness-Aware Exposure Allocation via Adaptive Reranking.” In this paper, the authors propose a solution to the problem of bias presented before re-ranking, by identifying additional results to increase fairness and exposure. The next presentation was “The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking” given by Maarten de Rijke. Group membership bias is when users click on a certain group’s results more often than other groups, so when these clicks are used to train models, it leads to relevant results in the other groups being ranked lower. The paper shows that an effective correction method for group membership bias is amortization, or to consider queries with similar group underestimation factors in aggregate for measurement.
It’s now the turn of @tjaenich to enter into action at #sigir2024. Thomas is presenting work entitled “Fairness-Aware Exposure Allocation via Adaptive Ranking” w/ @graham_mcdonald and @iadh pic.twitter.com/DnQ4HVYwXF
— Glasgow IR Group (@ir_glasgow) July 16, 2024
The fourth paper presented was “Optimizing Learning-to-Rank Models for Ex-Post Fair Relevance” by Sruthi Gorantla (@sruthigorantla). She considered the same group membership bias problem as the previous paper, but in the context of learning to rank models. The problem with learning to rank is that, unlike the second paper in this session, post-processing is not an option. She proposed a new objective that combines both relevance and fairness to optimize, and shows models trained with this new objective taken into consideration outperform other LTR models. The last full paper presentation was “Unbiased Learning-to-Rank Needs Unconfounded Propensity Estimation” by Dan Luo. Dan first showed that because a document’s feature representation influences both its rank from the logging policy as a dependent variable as well as whether or not the user clicks on the document via its true relevance as an independent variable, then a document’s feature representation is a confounding variable. This issue has not been accounted for in unbiased learning to rank policies previously. He used a type of analysis called backdoor adjustment to account for the confounding variable. He proposed a learning model that uses multiple steps to account for the confounding variable, and demonstrated its effectiveness.
Philipp Hager then presented his resource paper, “Unbiased Learning to Rank Meets Reality: Lessons from Baidu’s Large-Scale Search Dataset.” This paper shows the results from the 2023 WSDM Cup on a Baidu dataset in unbiased learning to rank models. They validated unexpected results from the authors of the dataset. Finally, John Lalor presented his TOIS journal paper, “Should Fairness be a Metric or a Model: A Model based Framework for Assessing Bias in Machine Learning Pipelines.” This paper proposes a new framework that takes into account multiple key performance indicators for fairness in machine learning.
Event: Banquet
The conference banquet was held at the Atrium of Old Ebbitt Grill. In addition to some hot buffet offerings, the best paper awards were presented. Congratulations to:
- Best paper: A Workbench for Autograding Retrieve/Generate Systems by Laura Dietz
- Best (student) paper: Scaling Laws For Dense Retrieval by Yan Fang et al.
- Best short paper: Evaluating Retrieval Quality in Retrieval-Augmented Generation by Alireza Salemi and Hamed Zamani
- Test of time award: Explicit factor models for explainable recommendation based on phrase-level sentiment analysis by Yongfeng Zhang et al.
Keynote 1: The Trajectory of Information Retrieval
Keynote 2: Petabyte-scale Information Retrieval
Keynote 3: AI Risk Management Framework
Event: Business Lunch
Congratulations to the 2024 inductees of SIGIR Academy! pic.twitter.com/LP8wSrMrMu
— SIGIR 2024 (@SIGIRConf) July 17, 2024
Panel: Access to Public Records
Session: Posters
The rest of Government Day ran at the same time as the poster session. I attended the poster session. The 87 accepted short papers were presented as posters on Wednesday afternoon. Attendees browsed the posters while snacking on cookies and ice cream bars.
(Left) On Backbones and Training Regimes for Dense Retrieval in African Languages, (Right) Best Short Paper, Evaluating Retrieval Quality in Retrieval-Augmented Generation
Information Retrieval for Climate Impact Workshop
The workshops were held on the last day of SIGIR. I attended the Information Retrieval for Climate Impact Workshop which was organized by Bart van den Hurk (@Bart_vd_hurk), Maarten de Rijke (@mdr), Flora Salim (@flosalim).
-Lesley Frew
Comments
Post a Comment