2022-07-25: ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2022 Trip Report

This year, the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2022) was held at Art'otel in Cologne, Germany from June 20-24, 2022. It was held in a hybrid manner, with participants attending both in-person from Art'otel and virtually from Zoom. Members of our Web Science and Digital Libraries (WSDL) research group (current and former) presented five papers at JCDL 2022. 

Members of WSDL also presented papers and posters at the Web Archiving and Digital Libraries (WADL) 2022 workshop, which was held in conjunction with JCDL 2022.

Day 1: 2022-06-20

The first day of the conference was dedicated to two tutorial sessions, and the Doctoral Consortium. Both tutorial sessions were conducted in parallel.

Tutorial Sessions

The tutorial "OpenRefine to Wikibase" was conducted by Lucia Sohmen and Lozana Rossenova from TIB Hannover, Germany. The tutorial "Building Digital Library Collections with Greenstone 3" was conducted by David Bainbridge from University of Waikato, New Zealand.

Doctoral Consortium

The Doctoral Consortium took place in person and via Zoom where five PhD students presented their research ideas.

The first presentation was by Sameh Frihat from University of Duisburg-Essen, Germany titled “Context-Sensitive, Personalized Search at the Point of Care”. He talks about his work in which they aim to develop a case-sensitive and personalized medical search engine for medical practitioners, focusing on medical doctors and researchers by integrating the users' interests and knowledge levels into the retrieval process. Medical research articles and clinical trials are the documents that were included in this search engine currently. They are hoping to add electronic health records to the document corpus in the future by keeping ethics and privacy concerns in mind.

The second presentation was titled “Integration of models for linked data in cultural heritage and contributions to the FAIR principles” by Inês Koch from University of Porto, Portugal. The main objective of her work is to promote the access and reuse of structured data originating from heritage institutions. She proposes to carry out a study that includes both existing data models for cultural heritage and the models that emerged with the web. The research builds upon the EPISA project.

The third presenter at the Doctoral Consortium was Bipasha Banerjee from Virginia Tech, USA presenting through Zoom on the topic “Opening Scholarly Documents Through Text Analysis”. She uses a collection of around 300,000 born-digital ETDs for her research. The aim of her research is to provide comprehensive metadata in the form of chapter labels to help readers understand the topic being discussed in the chapter and chapter-level summaries to help readers find the specific sections in the ETD without having to read the entire document. For this purpose, she uses a custom ETD-oriented language model to better understand the vocabulary in the corpus.

After a short coffee break, Luiz Barboza from CESAR School, Brazil presented his research on “The Effect of Data Science Teaching for Non-STEM Students”. The technological world has changed, bringing with it extensive processing power and modern programming languages such as Python and R. With data science being a multidisciplinary field, students from a non-STEM background face technical barriers when learning data science. As a solution, they propose a Data Science program for such students. Though this, they aim to support students' development and a prepare them for the future job market.

The final presentation was by Yuerong Hu from University of Illinois, USA on “Synthesizing Digital Libraries and Digital Humanities Perspectives for Illuminating Under-investigated Complexities associated with User-generated Book Reviews”. This research examines how to combine Digital Humanities and Digital Libraries to enlighten user-generated book reviews' under-examined intricacies. It also explores improving the usability and interpretability of user-generated book reviews. To empirically study the complexity of user-generated book reviews, they conducted case studies using data from two social reading and networking platforms, Goodreads and Douban.

Each presentation was followed by a Q&A session when participants and speakers discussed the research that had been outlined. Students also had a chance to ask the experts in the audience for advice and feedback on any challenges they were experiencing while performing their research.

Day 2: 2022-06-21

The second day marked the beginning of the main conference. The day consisted of four paper sessions, one keynote, and an invited talk.

Minute Madness

The day started off with minute madness videos of the conference presentations.

Paper Session 1

The minute madness video session was followed by the first paper session of the day, "Natural Language Processing." This session was chaired by Martin Klein (WSDL alumni) from Los Alamos National Laboratory, USA.

Paper Session 2

Next, the second paper session of the day "Information Retrieval and Access" started after a brief coffee break. This session was chaired by Norbert Fuhr from University of Duisburg-Essen, Germany.

Keynote 1

The first keynote of the conference, "Human-information Behavior and Interaction: Envisioning a New Paradigm Shift" was delivered by Dr. Dania Bilal from University of Tennessee, USA. In her speech, Dr. Bilal discusses the current status of human-information behavior and interaction and presents her thoughts on how these interactions are changing. She draws attention to how web search engines have evolved into the process of obtaining information. Many innovative system interfaces have been introduced in various library and information contexts to improve user experience (UX) as a result of recent advancements in artificial intelligence (AI). Voice assistants and different kinds of robots are examples of additions to such systems. What part do information specialists, system designers, governments, and industry play in promoting and assisting this change? What kinds of procedures and guidelines are necessary to introduce novel and successful user-AI system interactions? She also discussed the reasons why and the ways in which libraries must remain vital institutions during the paradigm shift.

Paper Session 3

The keynote was followed by the third paper session of the day, "Search and Recommendation". This session was chaired by Wolf-Tilo Balke from TU Braunschweig, Germany.

The session began with Yunqi Li from Rutgers University, USA presenting their paper "Causal Factorization Machine for Robust Recommendation". She discussed how causal feature selections in Factorization Machines (FMs) can be used to enhance the robustness of recommendation. A FM predicts users' preferences on items based on their feature vectors. They created a personalized causal feature selection method for FMs and emphasized that causal features selected for recommendation should be personalized to satisfy users' different preferences. They also conducted experiments to show the effectiveness of their method in enhancing the robustness of recommendations as well as improving the recommendation accuracy under the non-i.i.d. setting.

Next, Sumanta Kashyapi from University of New Hampshire, USA presented their paper "Query-specific Subtopic Clustering". He described their new method named Query-Specific Siamese Similarity Metric (QS3M) that is used for query-specific clustering of text documents. When given a query and documents, their subtopic clustering model can be used to get better query-specific subtopic clusters than previous methods like sentence BERT, TF-IDF, and topic models. Their approach also generalizes to unseen queries and different domains.

The session ended with Sourav Saha from Indian Statistical Institute, India presenting their paper "On Modifying Evaluation Measures to Deal with Ties in Ranked Lists." He discussed a new tie-aware version of Hit@k that they proposed named Tie-aware Hit@k (ta-Hit@k). Hit@k is an evaluation metric that can be used to evaluate recommender systems and question answering systems. They also created an alternative derivation of the formula Reciprocal Rank (RR) named Tie-aware RR (ta-RR).

Paper Session 4

Following another coffee break, the fourth paper session of the day "Web Archives" started. This session was chaired by Thomas Risse (@risse691) from Goethe University Frankfurt, Germany.

The session began with Helge Holzmann (@helgeho) from Internet Archive presenting their paper "ABCDEF - The 6 key features behind scalable, multi-tenant web archive processing with ARCH". He discussed ABCDEF (Archive, Big data, Concurrent, Distributed, Efficient, and Flexible) which are six principles used to guide the development and design of a system that processes web archive data. ARCH (Archives Research Compute Hub), Sparkling, and their Web Archive Datasets were also discussed during this presentation. ARCH, is a cloud-based system that was designed to meet all of the six principles of ABCDEF. ARCH is a platform that was built off of the past work by the Internet Archive (Archive-It) and Archives Unleashed Project (Archives Unleashed Cloud). The Sparkling Data Processing Library is a multi-purpose generic toolkit that is designed for web archive processing and can work with temporal web data. They have published their Web Archive Datasets, which currently consists of three collections: Early Web Datasets, Friendster Datasets, and GeoCities Datasets.

Next, Martin Klein from Los Alamos National Laboratory, USA (WSDL alumni) presented their paper "Investigating Bloom Filters for Web Archives Holdings." They tackled the problem of most archival holdings being largely unknown to the public (and to web archives) as they do not share CDX files for privacy reasons. He discussed how Bloom Filters can be used for discovering archived resources and sharing entire archival holdings. A Bloom Filter is a data structure that can reveal whether an element is present in a set. For the Bloom Filter, they used a database of hashed URIs. The advantages of using Bloom Filters is that the lookup is faster and it does not require publication of plain text URIs for index sharing. He mentioned that this approach is most likely suitable for smaller archives, individual collections, and live lookup of URLs during distributed crawl of topic collection.

The session ended with Yasith Jayawardana (@yasithdev) from Old Dominion University, USA (WSDL) presenting their paper "StreamingHub: Interactive Stream Analysis Workflows." He discussed StreamingHub which is a framework to build metadata propagating interactive stream analysis workflows using visual programming. This framework was created to assist with the problem of reusable data/code and reproducible analyses. They also proposed a metadata format and two platform heuristics. The metadata format was created to enable data reuse and is named Data Description System (DDS). DDS is used to collectively describe datasets, streams, and analytics. The two platform heuristics are Fluidity (F) and Growth Factor (GF). Fluidity is a heuristic for evaluating computational bottlenecks in a transformation. Growth Factor is a heuristic for evaluating the change in data volume through a transformation. They conducted two case studies to show how StreamingHub simplified the research process by allowing users to build reproducible experiments that generate verifiable results. In the case studies, their platform heuristics helped to make workload distribution and chaining decisions.

Invited Talk 1

Next, Michael Nelson (@phonedude_mln) from Old Dominion University, USA (WSDL) and Herbert Van de Sompel (@hvdsomp) from Data Archiving and Networked Services (DANS), Netherlands presented “D-Lib Magazine pioneered Web-based Scholarly Communication”. They discussed the innovations pioneered by D-Lib magazine. D-Lib magazine was an experiment in electronic publishing that did not have peer-review and no editorial board, but it was frequently cited in peer-reviewed literature and technically innovative. The innovations that D-Lib helped pioneer are Open Access (OA), HTML only publication, persistent identifiers and stable URLs, metadata discovery, rapid publication, and community engagement. Some of the experimentations that were mentioned during the talk were the use of Screencams, MPEG animations, and JavaScript (to inject annotations on links) in the articles.

Day 3: 2022-06-22

The third day of the conference consisted of four paper sessions, the dataset and demos session, and an invited talk. The day started off with a replay of the minute madness videos.

Paper Session 5

The minute madness session was followed by the first paper session of the day "Biblio/Alt-Metrics". This session was chaired by Hermann Kroll from TU Braunschweig, Germany.

Paper Session 6

Next, the second paper session of the day "Information Extraction" started after a brief coffee break. This session was chaired by Thomas Mandl from University of Hildesheim, Germany.

Paper Session 7

Next, the third paper session of the day "Search and Recommendation II" started. This session was chaired by J. Stephen Downie from University of Illinois, USA.

Datasets and Demos Session

This session was followed by lunch, and subsequently the datasets and demos session. This session had a total of 11 datasets and demos.

Paper Session 8

Following another coffee break, the fourth paper session for the day "User Behavior" started. This session was chaired by Thomas Mandl from University of Hildesheim, Germany.

Invited Talk 2

Next up was an invited talk by Dr. Heike Winschiers-Theophilus from Namibia University of Science and Technology, Namibia, titled "Bridging Worlds: Indigenous Knowledge in the Digital World". In this talk she outlined the epistemological clashes between indigenous knowledge and technology and discussed methods for co-designing technology and digital presentations of indigenous knowledge, based on their community projects in Namibia. One of the main lessons learned from this talk was the need for indigenous knowledge holders and communities to join forces in the development of technologies and the digitalization of their own knowledge systems, as well as the need for digital libraries to grow in order to accommodate cutting-edge knowledge-sharing methods like virtual reality.

Day 4: 2022-06-23

The fourth day of the conference consisted of three paper sessions, one keynote, two workshops (EEKE and NKOS), and a satellite event (NFDI). It began with a replay of the minute madness videos and the NFDI satellite event in parallel.

Paper Session 9

This was followed by the first paper session of the day "Scholarly Communications I". This session was chaired by Helge Holzmann (@helgeho) from Internet Archive.

Paper Session 10

Next, the second paper session of the day "Scholarly Communications II" started after a brief coffee break. This session was chaired by Ralph Ewerth from TIB Hannover, Germany.

This session was followed by the Town Hall meeting and subsequently lunch.

Paper Session 11

Next, the third paper session of the day "Classification" started. The session was chaired by Dwaipayan Roy from GESIS, Germany.

Keynote 2

The second keynote of the conference, "German National Research Data Infrastructure (NFDI) - Structure and Perspective" was presented by Dr. York Sure-Vetter from Karlsruher Institute for Technology, Germany and National Research Data Infrastructure (NFDI), Germany. He discussed FAIR principles, NFDI, NFDI consortia, and some projects that NFDI is involved in. The FAIR principles are Findable, Accessible, Interoperable, and Reusable. NFDI is the German National Research Data Infrastructure and their goal is to make relevant data available according to the FAIR principles. The NFDI consortia are collaborations between different scientific institutions that focus on NFDI's goal for research data management. NFDI is also involved in international projects like Gaia-X and the European Open Science Cloud (EOSC).

Closing Ceremony and Awards

Following the keynote, it was time to bid farewell to JCDL 2022. The closing ceremony started by announcing the recipients of JCDL 2022 awards.

Best student paper award(s): Vannevar Bush best paper award:

With this, JCDL 2022 came to an end.

-- Yasith Jayawardana (@yasithdev), Himarsha Jayanetti (@HimarshaJ), Travis Reid (@TReid803), Emily Escamilla (@EmilyEscamilla_)