2024-08-30: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024 Trip Report

SIGIR 2024 Opening Session

ACM SIGIR 2024 was held in Washington, D.C from July 14 - 18. There were over 900 attendees! The conference also featured two themed days: LLM Day and Government Day. Parallel topic sessions included presentations for full papers, resource papers, and invited Transactions on Information Systems (TOIS) journal papers. Short papers were presented as posters altogether in one session. Dr. Nelson (@phonedude_mln) and I (@lesley_elis) both attended SIGIR 2024. 


Monday

Opening and Keynote: Gerald Salton Award

The conference opened on Monday, with the conference chairs presenting about registration and the program chairs presenting about how papers were selected. The review process used an automated system to match reviewers to papers based on information from DBLP. This also was used to avoid conflicts, to the extent that reviewers and authors had different countries. Next, the opening keynote was reserved for the Gerald Salton Award winner. Congratulations to Ellen Vorhees for being named as the 2024 award recipient


Ellen’s keynote speech traced how evaluation of information retrieval systems has evolved over time. She explained the motivations for developing TREC, and showed multiple evaluation metrics for many of the datasets. A theme throughout Ellen’s talk was that she was not predispositioned to specific results, but she did always pursue discovery of the reasons behind the results.




Session 1: Evaluation


The first parallel session I attended was fittingly on evaluation, and included the presentation of seven papers, including three full papers, one perspective paper, and three resource papers. Matteo Corsi (@corsi_mat) presented his paper “The Treatment of Ties in Rank-Biased Overlap,” which used Kendall’s Tau as inspiration to extend Rank Biased Overlap for ties that represent uncertainty. They developed two extensions to RBO: one to break ties randomly rather than deterministically, and the other to correct RB for the presence of ties. Next, Mark Sanderson (@IR_oldie) presented his paper  “Uncontextualized significance considered dangerous.” The authors grouped TREC runs by participant group and showed that one possible reason for significance disagreements for groups compared to individual runs in topics could be Type I errors. An explanation for the presence of the errors is publication bias. The next paper presented was “Can we trust Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance” by Theresia Rampisela. Theresia showed that fairness judgments and relevance judgments are separate measures in recommendation systems. New measures combined these two topics into one measure, and Theresia’s work evaluates these new joint measures. She found that the measures are neither expressive nor granular, and the measures do not correlate with each other either, leading her to advise caution in their use.



Nick Craswell (@nick_craswell) was the next to present, and he presented his perspective paper, “What Matters in a Measure? A Perspective from Large-Scale Search Evaluation.” The paper detailed concerns in evaluation from the industry perspective. One example concern was when metrics are mutated into goals, inadvertently turning a secondary measure of a goal into the priority. Following were presentations of three resource papers. “CIRAL: A Test Collection for CLIR Evaluations in African Languages” was presented by Chrystina Xinyu Zhang (@crystina_z). This paper was nominated for a best paper award. The test collection includes four languages: Hausa, Somali, Swahili, and Yoruba. The test collection also includes human annotated queries and relevance judgments. The next resource paper, “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries,” was presented by Qiaosheng Chen. ACORDAR 2.0 extends ACORDAR 1.0 to be dense and to include question style queries. The final resource paper, “Browsing and Searching Metadata of TREC,” was presented by Timo Breuer. The implementation includes a publicly available online metadata browser and an API for JSON results.


Event: DEI Lunch


The lunch session on Monday focused on diversity, equity, and inclusion, and included a panel featuring Bhaskar Mitra (@UnderdogGeek), Clemencia Siro, Vanessa Murdock (@vanessa_murdock), Doug Oard, and Nicola Ferro (@frrncl). After a land acknowledgement, the panelists brought forth ideas to improve the diversity of attendees at the conference. Some of these ideas included adding a findings track similar to EMNLP, creation of affinity groups similar to NeurIPS, and training future leaders in their home environments. The lunch itself was a hot buffet with lots of options.



Keynote 2: Representation Learning and Information Retrieval


Yimimg Yang gave the afternoon keynote. She explained how Representation Learning has changed information retrieval in recent years. She discussed improvements in document training, RAG in IR, and ideas to push the limits of what is possible in representation learning for IR.



Session 2: SIRIP Domain Specific


For the second parallel session, I attended the SIGIR Symposium on IR in Practice (SIRIP) domain specific session, as I ate lunch with the fourth presenter Johny Moreira. First up was Shiri Dori-Hacohen (@shirki) presenting her paper, “Misinformation Mitigation Praxis: Lessons Learned and Future Directions from Co·Insights” about misinformation specifically targeting Asian American and Pacific Islanders (AAPI) users. Shiri’s work covers three years following an NSF grant, and includes many outcomes, one of which is an explorer for fact checking misinformation from user donated WhatsApp data. Next, Sebastian Björkqvist presented his work, “Relevance Feedback Method For Patent Searching Using Vector Subspaces.” He demonstrated his method to improve recall for finding novelty destroying documents before submitting a new patent. Finally, Johny Moreira presented his work, “A Study on Unsupervised Question and Answer Generation for Legal Information Retrieval and Precedents Understanding.” He showed how clustering along with summary by an LLM could be used to group differing legal opinions and applied the technique to Brazilian case decisions.



Session 3: Users and Simulations


For the third parallel session on Monday, I attended the users and simulations session. Three full papers were able to be presented in person and there was also one TOIS paper. Kaixin Ji presented her paper, “Characterizing Information Seeking Processes with Multiple Physiological Signals.” This paper was a very interesting mix of information seeking and human computer interaction, using physiological evidence to support segmentation of distinct phases in the information seeking process. Her department (RMIT) was extremely supportive and proud of her strong work, and I hope to read more from her in the future.



Next, Zhongxiang Sun presented his paper, “To Search or to Recommend: Predicting Open-App Motivation with Neural Hawkes Process.” Kuaishou, a short video social media platform in China, was a sponsor of SIGIR this year, and they were an industry partner on this paper. Kuaishou supports both search as well as recommendation, and this paper developed a process to predict which activity users would engage in based on their past browsing history. The next paper presented by Teng Shi, “UniSAR: Modeling User Transition Behaviors between Search and Recommendation,” was also written with Kuaishou as an industry partner. This paper proposed a framework for modeling transitions in joint search and recommendation systems. The result of the framework is an improvement in the performance of both the search and recommendation systems. Finally, Ahmed Abbasi presented his TOIS paper, “Examining User Heterogeneity in Digital Experiments.” The paper creates a framework to identify subgroups whose outcomes are statistically significant compared to the group as a whole. They showed that this phenomenon is actually common when grouping users by demographics but also traits like satisfaction.


Event: Women in IR


The Women in IR event followed the third parallel session. The room was full of women and allies, and there was a delicious charcuterie spread. After a brief history of the event, some of the women gave lightning introductions. SIGIR chair Vanessa Murdock (@vanessa_murdock) then led a Q&A. She addressed questions on how to cope with the lack of female role models and how to identify gender bias in the context of negative feedback.



Tuesday

SIGIR LLM day took place on Tuesday. It included a variety of talks on how LLMs are impacting IR in terms of recommendation systems, industry, domain specific, and search trends.


Keynote: Towards Steerable AI Systems


Thorsten Joachims gave the LLM day keynote on Steerable AI Systems. He talked about how coactive learning can transfer personalization to recommendations generated by LLMs and how LLM policies can be used to bridge the gap between macro and micro system goals.




Session 1: CTR, Ads & Click Models


(Left) Ben London and Jan Malte Lichtenberg presenting “Counterfactual Ranking Evaluation with Flexible Click Models,” (Middle) Maarten de Rijke presenting “Evaluating the Robustness of Click Models to Policy Distributional Shift,” (Right) Josef Vonášek presenting “CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking.”

The first session I attended on Tuesday was about Click Through Rate, Ads, and Click Models. I chose to attend this session because of my work with the ORCAS query-click dataset. There were three live non-proxy presentations. First, Ben London and Jan Malte Lichtenberg (@JanMalteL) presented their paper, “Counterfactual Ranking Evaluation with Flexible Click Models.” The paper presents a new model for ranking based on balancing bias and variance, which has a lower mean squared error than either model individually and allows for arbitrary definition of the windows used to make the new model. Next, Maarten de Rijke (@mdr) presented his TOIS journal paper, “Evaluating the Robustness of Click Models to Policy Distributional Shift.” The paper shows that when a click model is trained with one logging policy but then implements another in practice, the training model will not perform as well under the new policy. They showed that certain click models, such as simple models like the position based model, are more robust. They used their findings about different click models to propose a new evaluation protocol for click models with policy shifts. Finally, Josef Vonášek presented his resource paper, “CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking.” The dataset includes 50,000 expert annotated query-document pairs for testing.

Event: Student Lunch


Students and members of SIGIR leadership sat together at tables at the student lunch to get to know each other and ask questions about careers in IR. My table was very diverse geographically and included me, 2 students from South Korea, 1 student from Germany, and two international students studying in the US. The students were joined by Behrooz Mansouri, Guido Zuccon, and Ryen White. It turned out to be a small world:

  • Behrooz Mansouri (@behrouzmansoury) has worked with WSDL’s own Dr. Wu.

  • One US-based student, Chiman Salavati, is supervised by Shiri Dori-Hacohen who I had met the previous day.

  • The other US-based recently graduated student, Dan Luo, was supervised by Brian Davison (@BrianDavison). Dr. Davison has conducted research with the Internet Archive. We did not make this connection until Brian, Johny, Behrooz and I met up to walk to the banquet and were waiting for our advisors, but when the two advisors joined us they clearly already knew each other..

At our table, we mostly talked about academic versus industry careers. The lunch was another high quality hot buffet with many options.


Session 3: Fairness


I had already planned to attend the fairness session, but solidified my plans after I had lunch with one of the presenters, Dan Luo. Fairness has an interesting definition in IR, in that it might mean fair exposure of companies when presenting products to consumers, or it might mean a lack of discrimination when presenting applicants to HR. One thing that surprised me during this session was that two of the papers used the MovieLens dataset for evaluation of their models. I was first introduced to this dataset in CS 532 Web Science through the Programming Collective Intelligence book, and one of the things that I focused on in my report that week was about how biased the 100k dataset was for the recommendation system built. I am glad to learn that this dataset is a standard for evaluating bias in models.


This session included 5 full papers, a resource paper, and a journal paper. First up to present was Chen Xu with “A Taxation Perspective for Fair Re-ranking.” This paper integrated economic taxation principles with IR, aligning different user environments with the ideas of prosperity and downturn. They showed that their model had more consistent exposure than other models when subjected to different fairness parameters. Next, Thomas Jaenich (@tjaenich) presented his paper, “Fairness-Aware Exposure Allocation via Adaptive Reranking.” In this paper, the authors propose a solution to the problem of bias presented before re-ranking, by identifying additional results to increase fairness and exposure. The next presentation was “The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking” given by Maarten de Rijke. Group membership bias is when users click on a certain group’s results more often than other groups, so when these clicks are used to train models, it leads to relevant results in the other groups being ranked lower. The paper shows that an effective correction method for group membership bias is amortization, or to consider queries with similar group underestimation factors in aggregate for measurement.



The fourth paper presented was “Optimizing Learning-to-Rank Models for Ex-Post Fair Relevance” by Sruthi Gorantla (@sruthigorantla). She considered the same group membership bias problem as the previous paper, but in the context of learning to rank models. The problem with learning to rank is that, unlike the second paper in this session, post-processing is not an option. She proposed a new objective that combines both relevance and fairness to optimize, and shows models trained with this new objective taken into consideration outperform other LTR models. The last full paper presentation was “Unbiased Learning-to-Rank Needs Unconfounded Propensity Estimation” by Dan Luo. Dan first showed that because a document’s feature representation influences both its rank from the logging policy as a dependent variable as well as whether or not the user clicks on the document via its true relevance as an independent variable, then a document’s feature representation is a confounding variable. This issue has not been accounted for in unbiased learning to rank policies previously. He used a type of analysis called backdoor adjustment to account for the confounding variable. He proposed a learning model that uses multiple steps to account for the confounding variable, and demonstrated its effectiveness.


Philipp Hager then presented his resource paper, “Unbiased Learning to Rank Meets Reality: Lessons from Baidu’s Large-Scale Search Dataset.” This paper shows the results from the 2023 WSDM Cup on a Baidu dataset in unbiased learning to rank models. They validated unexpected results from the authors of the dataset. Finally, John Lalor presented his TOIS journal paper, “Should Fairness be a Metric or a Model: A Model based Framework for Assessing Bias in Machine Learning Pipelines.” This paper proposes a new framework that takes into account multiple key performance indicators for fairness in machine learning.


Event: Banquet


The conference banquet was held at the Atrium of Old Ebbitt Grill. In addition to some hot buffet offerings, the best paper awards were presented. Congratulations to:

You can play Where’s Waldo with Dr. Nelson

Wednesday

Government Day took place on Wednesday at SIGIR 2024. The morning and early afternoon included keynotes and a panel by government leaders from multiple government agencies.

Keynote 1: The Trajectory of Information Retrieval


Michael Littman (@mlittmancs), the Director of the Division of Information and Intelligent Systems at the NSF (National Science Foundation) gave the keynote address to the entire conference. The keynote traces the history of IR through NSF grants from the 1970s to the present. The presentation was interesting, but the questions were even more interesting. One question asked about how NSF grants achieve the goal of growing students. Another question asked about how to get on a reviewer panel and what that experience is like. Whether or not conference attendees were planning to attend Government Day sessions or other sessions, the keynote was engaging and relevant.




Keynote 2: Petabyte-scale Information Retrieval


After the main conference keynote, there were two additional keynotes for Government Day attendees. First, Kim Pruitt (@kdpru), the Director of the National Center for Biotechnology Information at NLM/NIH (National Library of Medicine, National Institutes of Health) gave a keynote about IR at scale. First, she talked about the balance between using humans to generate metadata for indexing versus using machine learning. They compared the output of both techniques and found pros and cons for each. Humans made mistakes, but algorithms also made mistakes or missed important indexing terms. Next, she talked about a ranking system using a model based on clicks. They reached out to NIST (National Institute of Standards and Technology) to determine if the model had biases, and did find that the clicks introduced bias, which led them to modify their model to only use clicks that are a certain amount of time in the past. Finally, she talked about how NIH might use generative AI. She brought up concerns that were echoed in the Public Access panel later in the day. In the context of a government information system, generative AI would need to maintain the public trust. Incorrect information generated by a chatbot would be harmful for maintaining trust.

Keynote 3: AI Risk Management Framework


The next Government Day keynote was given by Elham Tabassi, a Senior Scientist at NIST. She talked about the creation of the AI Risk Management Framework and the roadmap to its implementation. The framework was developed to fulfill the 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. The framework is going to be evaluated based on NIST’s overall goals, such as advancing science.

Event: Business Lunch


The business lunch featured awards and plans for future conferences. There were three awards presented: Early Career Research, Community, and DEI. They also presented the SIGIR Academy Inductees. After the awards, they announced that the conference location for 2026 is Melbourne, Australia. They also presented three 2027 proposals for North America. The lunch was a boxed lunch.

Panel: Access to Public Records

After lunch, Government Day resumed with a panel about Access to Public Records. The panel included Gulam Shakir, CTO of NARA (National Archives and Records Administration), as well as Jilly Reilly, the Digital Engagement Director also at NARA. The panel was moderated by Jason Baron. One of the ways that NARA is providing access to public records is that they used AI to transcribe the 1950 census. Previous census records were transcribed manually via crowdsourcing. On the 1950 census NARA user interface, the fact that the transcripts were generated using AI is prominently displayed, which builds trust with the public. Another discussion point in the panel was that in order for government agencies to build a system using LLMs that would be trusted by its users, the LLM must return sources with its answers.

Session: Posters


The rest of Government Day ran at the same time as the poster session. I attended the poster session. The 87 accepted short papers were presented as posters on Wednesday afternoon. Attendees browsed the posters while snacking on cookies and ice cream bars.


(Left) On Backbones and Training Regimes for Dense Retrieval in African Languages, (Right) Best Short Paper, Evaluating Retrieval Quality in Retrieval-Augmented Generation



Thursday

Information Retrieval for Climate Impact Workshop


The workshops were held on the last day of SIGIR. I attended the Information Retrieval for Climate Impact Workshop which was organized by Bart van den Hurk (@Bart_vd_hurk), Maarten de Rijke (@mdr), Flora Salim (@flosalim).


First, there were four keynotes, each on a specific theme. Ramamurthy Valavanda presented his keynote on the Intergovernmental Panel on Climate Change (IPCC)’s information needs. Next, Harry Scells (@Hscells) presented a keynote on methodologies, specifically systematic reviews. The third keynote was presented by Tanwi Mallick on resources, such as databases, repositories, and technological tools available for climate impact research. Finally, Alaa Al Khourdajie (@DrAlaaClimate) & David Huard presented a remote keynote on IPCC integration.

Next, there were seven lightning talks. Three presenters were virtual and four were in person. I gave my talk on Information Retrieval Needs in Climate Impact from Federal Environmental Webpages (slides). A version of this paper was accepted as a short paper at CIKM 2024 and will be available shortly on the ACM digital library open access through ACM Open.


Finally, we broke apart into four groups to write about the four themes. I wrote with Harry Scells (@Hscells) and Damiano Spina (@damiano10) on the methodology theme. The full agenda resulting from the writing sessions is proposed to appear in the SIGIR Forum later this year.

Wrap-Up

This was my first time attending SIGIR, and my first time presenting at a workshop. I attended JCDL last year, which is a smaller conference. It was nice to see some familiar faces at SIGIR from JCDL, such as Christin Kreutz (@kreutzch). Besides the size of the conference, another way that SIGIR was different than JCDL is that there is a large industry presence. I liked attending the women's event, because it was empowering to have so many of us in the room together, even though we were underrepresented in the conference overall. My two new friends, Johny and Dan, had never attended a conference in person due to the pandemic, even though they both had recently finished their PhDs. Dan also had not seen his family in his home country since starting his studies, so learning about their experiences really made me thankful for the opportunity to attend multiple conferences and study domestically. I also appreciated how certain SIGIR regulars, like Shiri Dori-Hacohen, went out of their way to make sure I felt welcome. Overall, it's harder to feel connected to people at a large conference like SIGIR compared to JCDL, but I was impressed by the quality of research at the conference and am glad I was given the opportunity to attend, especially since it was so close. I would also like to thank ODU for the travel grant to cover the registration for the conference.

 -Lesley Frew

Comments