2022-10-11: Theory and Practice of Digital Libraries (TPDL) 2022 Trip Report

Prato della Valle in Padua, Italy

This year, the 26th International Conference on Theory and Practice of Digital Libraries (#TPDL2022) returned to an in-person, hybrid format after taking place solely online for the last two years. TPDL was held in beautiful Padova, Italy at the Istituto Sant’Antonio Dottore from September 20-23, 2022. Emily Escamilla and Himarsha Jayanetti from the Web Science and Digital Libraries (WSDL) research group attended the conference in-person and presented four papers at TPDL 2022. There were four presentations from WSDL and one from a recent WSDL alumnus:

The full proceedings have been published in the Lecture Notes in Computer Science (LNCS).

Day 1: 2022-09-20

The first day of the conference was dedicated to workshops and the Doctoral Consortium which were conducted in parallel.

Linked Archives Workshop

For the keynote, Kerstin Arnold (@kerstarno) from Archives Portal Europe presented “No Archive is an Island - A Tale of Exploring a Brave New World. Archives Portal Europe is a project that allows for aggregation and discovery across over 600,000 collections from over 7,100 institutions. In her keynote, Arnold explained the current International Standard Archival Description (General) (ISA(G)) standards, the standards being implemented by Archives Portal Europe, and the lessons they have learning along the way. 

Artificial Intelligence and Archives

Luís Filipe da Costa Cunha from the Department of Informatics at University of Minho presented “Fine-Tuning BERT models to extract Named Entities from Archival Finding Aids. Their work is an improvement on the NER model specifically for the Portuguese language. They created an API, a web platform, and an automatic annotator

Manual document mining of born-physical cultural heritage objects to create metadata is time-consuming. Mariana Dias from the University of Porto presented “Mining Typewritten Digital Representations to Support Archival Description, part of their Entity and Property Inference for Semantic Archives (EPISA) Project. They proposed an architecture that combines optical character recognition (OCR), information extraction, and ontology population to conduct document mining for automatic metadata records. 

Infrastructures for Archives and Linked Data

Can Web page titles be used to detect content drift? Brenda Reyes Ayala (@CamtheWicked) from the University of Alberta presented “Detecting content drift on the Web using Web archives and textual similarity. She proposed leveraging Web page titles to detect content drift and found 92.1% recall across three collections. Additionally, the run time was short compared to other methods that have been used to detect content drift. Her work was inspired by “Scholarly context adrift: Three out of four URI references lead to changed content by Shawn M. Jones (@shawnmjones), Herbert Van de Sompel (@hvdsompel), Harihar Shankar (@hariharshankar), Martin Klein (@mart1nkle1n), Richard Tobin, and Claire Grover.

Sérgio Nunes from the University of Porto presented “EPISA Platform: A Technical Infrastructure to Support Linked Data in Archival Management, another portion of the EPISA project mentioned above. Part of their presentation includes a demonstration of the EPISA ArchClient which they created to provide archivists with a graphical user interface to access, manage, and describe collections.

Models for Linked Archives

Architectural artifacts cannot be easily accessed or searched with traditional finding aids. Daria Mikhaylova from the University of Pisa presented “An extension of RiC-O for architectural archives. Their solution models a architectural project including its different phases, types of records, and architectural artifacts. The extension also provides a structured and formal representation of the archive that is compatible with existing standards. 

Alex Green and Faith Lawrence from The National Archives of the UK presented “The Shock of the New: Testing the Pan-Archival Linked Data Catalogue with Users. The National Archives of the UK are in the process of creating a canonical Linked Data catalog that is user-focused. However, as with all major changes to crucial work products, the process has been full of discussions between stakeholders (i.e., users, archivists, developers). They talked about the goals they are working to achieve, the challenges they have faced, and the lessons learned.

Doctoral Consortium

Three doctoral students presented their doctoral research work to senior researchers and other participants in an informal setting. First, Sluefika Efeog, a Ph.D. student of Corporate Semantic Web Group of Freie Universitaet Berlin presented her preliminary research titled  “A Continual Relation Extraction Approach for Knowledge Graph Completeness”. Next, Nicolò Pratelli, a Ph.D. student at the University of Pisa presented his work titled “A Geographical Extension for NOnt Ontology”.  Finally, Nikos Vasilogamvrakis from National Documentation Centre in Greece presented his work titled “The Ontological Approach of Modern Greek Morphology” via Zoom to the audience. It was a great opportunity for doctoral students to receive constructive feedback on their preliminary research work. 

Day 2: 2022-09-21

To kick off TPDL 2022, Roberto di Cosmo (@rdicosmo) presented a keynote titled “Why we must preserve the world’s software history, and how we can do it” based on his paper “Should We Preserve the World’s Software History, and Can We?. In his keynote, di Cosmo emphasized the importance of software source code as precious knowledge that is necessary for Open Science, security, and transparency. However, repositories are fragile and forges are not archives. He presented Software Heritage, the largest software archive, as a solution to these problems and provided a demo of its features and functionality. He called on the audience to join the effort to preserve software source code by archiving and referencing source code in Software Heritage as well as contributing to the open source framework and raising awareness. 

Session 1: Web Archiving

How do users and robots access the archives? Himarsha Jayanetti (@HimarshaJ) from Old Dominion University’s Web Science and Digital Libraries (WSDL) research group presented “Robots Still Outnumber Humans in Web Archives, But Less Than Before where they analyzed the differences between robots and human usage patterns and their temporal preferences. They found that the total bots detected in the Internet Archive 2019 sample (70%) are less than the 2012 sample (91%) and robots accounted for 98% of all requests to arquivo.pt in 2019. 

Emily Escamilla (@EmilyEscamilla_) presented “A Chromium-based Memento-aware Web Browser on behalf of Abigail Mabe (@abigail_mabe) from Old Dominion University’s Web Science and Digital Libraries (WSDL) research group. Abigail used Chromium to create a prototype of a Memento-aware browser. The browser was able to detect the presence of Mementos in the open tab. She also enhanced the bookmark functionality by allowing users to archive web pages within the bookmarking process. The paper presented at TPDL 2022 was a shortened version of her Master’s Degree project “A Chromium-based Memento-aware Web Browser.

Sawood Alam (@ibnesayeed) from Internet Archive (and an ODU WSDL research group alum) presented “CDX Summary: Web Archival Collection Insights. CDX Summary is a tool that generates machine and human readable reports based on metadata like URLs, hosts, query parameters, status codes, and more. 

Theresa Elstner from the Webis group (@webis_de) at Leipzig University presented “Visual Web Archive Quality Assessment. She categorized the perceivable reproduction error types: existence error, positional error, and positional error. She also created and tested a system of visually aligning page segments to more accurately measure pixel difference and, as a result, the quality of an archived Web page. 

Himarsha Jayanetti wrapped up the session with her presentation “Creating Structure in Web Archives with Collections: Different Concepts From Web Archivists. She presented how eight Web archive platforms utilize collections as well as the different types of “collection structures” followed by them. She emphasized, for instance, how some web archive collections support private collections while others do not, and how some collections have sub-collections while others do not. She also identified two main types of navigational hierarchies followed by those web archive platforms. The first type is where the original resource (URI-R) supports the collection’s theme and the second type is where the memento (URI-M) supports the collection’s theme. A much more detailed technical report is available in arXiv.  

Booster Session 1

After the lunch break, the conference conducted a booster session, which included quick 3-minute presentations for each of the papers accepted for the “Accelerating Innovation Papers” track. During the poster session at the end of the day, each of these papers had a poster that was exhibited. During this session, eight presenters shared their new ideas/late-breaking results. 

Session 2: Cultural Heritage

Session 2 on “Cultural Heritage” began with a presentation by Agathi Papanoti from National Documentation Centre in Greece on their work titled Enriching the Greek National Cultural Aggregator with Key Figures in Greek History and Culture: Challenges, Methodology, Tools and Outputs. She discussed their approach, challenges,  and the technologies that had been employed over the past two years for the process of enhancing the metadata of Cultural Heritage Objects (CHOs) that had been collected by the Greek cross-domain Cultural Data Aggregator, SearchCulture.gr

The second presenter was Hille Ruotsalainen from Tampere University who presented her paper titled “Searching Wartime Photograph Archive for Serious Leisure Purposes”. She described evaluating user success scores and user engagement using the user engagement scale (UES) in a recorded presentation that was played during the session. She also suggested research implications based on the results.

Pierre Cubaud from Le CNAM presented his paper titled “Overview visualizations for large digitized correspondence collections: a design study”. He introduced “overview visualization” as a useful alternative to search engines in digital libraries. The tool was built in the context of a large correspondence collection which is 20K letters from the Godin-Moret archive. They also created a video mockup of the system.

Harry Halpin from the American University of Beirut presented his work titled “The Knowledge Trust: A Proposal for a Blockchain Consortium for Digital Archives”. He presented their model “The Knowledge Trust”, in which current digital libraries may use blockchain technology to leverage on the advantages of their own curation skills and do integrity checks that can assist in identifying data loss in digital archives.

Daniel Zilio from University of Padova presented “Design and evaluation of a mobile application for an Italian UNESCO site: Padova Urbs picta”. In this study, he described how a smartphone application that would promote Padua’s fourteenth-century fresco cycle (which has been registered in the UNESCO World Heritage List) was designed and evaluated. 

Session 3: Scholarly Communication I

As the first presentation of session 3, Asheesh Kumar from the Indian Institute of Technology Patna presented his work titled “Investigations on Meta Review Generation From Peer Review Texts Leveraging Relevant Sub-tasks in the Peer Review Pipeline”.  He presented their novel method  to automatically generate decision-aware meta-reviews that additionally take into account a number of pertinent sub-tasks in the peer-review process. 

The second presentation was on “Whois? Deep Author Name Disambiguation Using Bibliographic Data” presented by Zeyd Boukhers from University of Koblenz-Landau. By utilizing the co-authors and research domain, this study suggests an Author Name Disambiguation (AND) approach that links author names to their real-world entities. They developed a neural network model that learned from the representations of the co-authors and titles.

The next presentation was by Tove Faber Frandsen from the University of Southern Denmark on their work titled “Exploring research fields through institutional contributions to academic journals”. She discussed how they looked at different institutions in terms of who contributed to journals, how they determined whether institutional contributions to Library and Information Science journals have remained consistent over time and whether there are variations among different journals. They found out that for some journals, only around 10% of the contributing institutions are continuants meaning that the institution published in a given year and also published at least one paper within the previous three years or in the three years to come to the same journal. 

The fourth presentation of session 3 was by Rand Alchokr from Otto von Guericke University Magdeburg. She presented their work titled “A Closer Look into Collaborative Publishing at Software-Engineering Conferences” where they studied two properties of research collaborations in software engineering: the number  of authors and their research experience. They discovered that collaborative research (multi-author) is increasingly common today with a decline in the percentage of single-author papers. Their research revealed that two to four researchers was the most common team size and that in order to publish at prestigious conferences, junior researchers seemed to require the support of experienced co-authors.

The final presentation of the “Scholarly Communication I” session was by Yusra Shakeel from Karlsruhe Institute of Technology on “Weighted Altmetric Scores to Facilitate Literature”. She discussed their most recent work, which proposes weighted altmetric scores for a more reliable and precise analysis of papers to support the labor-intensive manual literature analysis procedure. Overall, their method performed well with positive results, but further research would help validate the potential of weighted metrics.

Poster Session 1

At the end of the day, the presenters from the booster session had the opportunity to present their posters to the conference participants during the poster session, which was held in the venue's lobby.

Social Dinner

On Tuesday evening, TPDL hosted a social dinner at Caffè Pedrocchi, a historic café that opened in 1831. 

While at the dinner, the program committee presented the Best Paper Award. Three papers were nominated:

We were excited to receive the Best Student Paper Award for "Robots Still Outnumber Humans in Web Archives, But Less Than Before", which was accepted by Himarsha Jayanetti. The Best Paper Award was presented to Zeyd Boukhours for "Whois? Deep Author Name Disambiguation using Bibliographic Data." Congratulation to all of the award-winning authors!

Day 3: 2022-09-22

The second keynote speaker of the conference was Georgia Koutrika from Athena Research Center in Greece. Her talk was titled “Democratizing Data Access: What if we could just talk to our data?” where she talked about making data easily accessible and useful to humans. She began her talk by mentioning how important data is in recent times and benefits in exploring data has become increasingly more prominent. As the presentation's title indicates, the main focus of the talk was on how a human user may engage with the data using a system (referred to as an intelligent data assistant) in a natural way. Real world data is not readily available and requires complex SQL commands that require expertise and an understanding of the data schemas, instead imagine a system that enables users to interact and collaborate with the system as if it were a human in order to explore data and find solutions. She also discussed the challenges of developing these intelligent data assistants, such as the fact that some words can have multiple meanings (for example, "movies" and "films") and that the same idea can be expressed in various ways (for example, "how many people live in" can refer to the "population" column in a dataset). These challenges show how translating a natural language query to a structured query that the machine understands is hard. She also talked about a conversational system where the data assistant could talk back and ask questions for clarification (SQL2NL) and also explain the results (QR2T). Through this keynote, she emphasized the significance of how much data we can explore, not just how much data is present.

Session 4: FAIR and Open Data

Leon Martin from the University of Bamberg presented “RDFtex: Knowledge Exchange between LaTeX-based Research Publications and Scientific Knowledge”. He presented a RDFtex, a tool that allows the import and export of contributions from and to Scientific Knowledge Graphs (SciKGs). RDFtex can be integrated into automated workflows and implemented with only four additional LaTeX commands.

Dagoberto Jose Herrera-Murillo and Abdul Aziz from ODECO (@ODECO_etn) and IAAA Lab (@IAAA_Lab) presented “Analyzing User Involvement in Open Government Data Initiatives. They are working to shift from the traditional supplier-driven data catalogs that are currently used by Open Data Initiatives (ODIs) to a user-driven solution. They also  explored user interactions with ODI portals in the EU. Ideally, their findings would influence portals to modify their approaches to be more user-driven.

Ian Bigelow from the University of Alberta Library presented “Conducting the Opera: The Evolution of the RDA Work to the Share-VDE Opus and BIBFRAME Hub. His presentation focused on the developmental trajectory that has led to the current status of the RDA and BIBFRAME models.

Oral history collections contain a large variety of contents and metadata; however, users do not want to go through long interviews to find information. Maria Vrachliotou from the Department of Archives at Ionian University presented “Ontology-based metadata integration for Oral History Interviews. She presented a solution that indexes and segments interviews as well as creates a model for semantic representation of interviews using ontologies. The next step in the project is to test the model on existing oral history collections. 

How can organizations make FAIR principles, typically viewed as cumbersome, accessible and attractive to researchers to encourage adoption? Lyudmila Balakireva from Los Alamos National Laboratory presented “Making FAIR Practices Accessible and Attractive. They created a FAIR-ready data management framework that makes it easier to require FAIR principles through automation. 

Booster Session 2 

After the lunch break was the second booster session of the conference with eight presentations of the papers accepted for the “Accelerating Innovation Papers” track.  Similar to the first day of the conference, each of these papers had a poster that was exhibited at the end of the day poster session.

Session 5: Scholarly Communication II

The first presentation of session 5 was by Silvio Peroni, an associate professor at the University of Bologna in Italy. He presented their work titled “The way we cite: common metadata used across disciplines for defining bibliographic references” (slides). He discussed how they looked into various citation techniques that were used in articles to reference various types of entities. Despite the fact that citations are standardized, numerous "standards" exist amongst journals and disciplines in practice. They examined  around 34k bibliographic references extracted from a vast set of journal articles on 27 different subject areas which enabled them to highlight the most used metadata for defining bibliographic references across the subject areas.

Next, Emily Escamilla (@EmilyEscamilla_) from Old Dominion University’s Web Science and Digital Libraries research group presented her work titled “The Rise of GitHub in Scholarly Publications” (slideswhere she discussed how GitHub is increasingly being referenced in scholarly publications, highlighting the importance of archiving GitHub repositories for reproducibility. She emphasized that although links to Git Hosting Platforms (GHPs) are becoming more prevalent in scholarly publications, GHPs are not permanent. For instance, they found out that 1 out of every 5 publications in arXiv in 2021 has at least one link to GitHub. She mentioned the need for improved archiving strategies for GHPs to preserve scholarly records as she wrapped up her talk.

Silvio Peroni, the session's first presenter, gave his second presentation titled “Structured references from PDF articles: assessing the tools for bibliographic reference extraction and parsing” (slides). He pointed out in this talk that more literature means more data, which also translates into more metadata, and that publishers (big or small) have to put a lot of effort into successfully extracting the metadata in structured forms. Adopting off-the-shelf bibliographic reference extraction tools that automatically extract references from PDF files is the authors' suggested remedy for the problem. They evaluated different tools  that can be used in extracting and parsing bibliographic references of academic papers and found out that Anystyle and Cermine were the tools that worked the best overall.

The next presentation was by David Pride from The Knowledge Media Institute at The Open University on their work titled “Cui Bono? Cumulative Advantage in Open Access Publishing”. Their study looked at open access (OA) production,  OA consumption,  and who is benefiting the most from the use of current OA publishing policies. He discussed how they discovered whether there is a correlation between institutional prestige variables and their consumption of OA resources. Based on their data, they discovered that OA production and consumption have a moderate to strong correlation, with a stronger correlation for OA consumption by higher ranked institutions than lower ranked ones. This demonstrated that existing OA efforts are more beneficial to prestigious institutions.

Cesare Concordia from Institute of Information Science and Technologies (IIST), Italian National Research Council (CNR) wrapped up the session with his presentation “The SSH Data Citation Service, a tool to explore and collect citation metadata”.  He presented the SSH Data Citation Service (DCS), a piece of software that offers the ability to locate and assess metadata pertaining to digital objects, particularly datasets, that are referenced in citation strings. The DCS is created in accordance with the traditional client-server architecture: the client, known as Citation Metadata Viewer, displays the metadata and offers actionability functions; the backend handles the discovering and managing of metadata. The interaction protocol between client and server components is implemented through a REST API.

Session 6: Text Analysis and Extraction

Text analysis on digital libraries involves diverse data sources and complex processing tasks. Yannis Foufoulas from the National and Kapodistrian University of Athens presented “Declarative text analysis through SQL (nominated for Best Paper Award). He proposed DETEXA, a library of reusable User-Defined Functions (UDFs) for text analysis built on top of YeSQL. Their approach outperformed PySpark in most settings

Jill Naiman from the University of Illinois, Urbana-Champaign presented “Figure and Figure Caption Extraction for Mixed Raster and Vector PDFs: Digitization of Astronomical Literature with OCR Features. They built a great model for extraction figures and captions scientific literature. They achieved good precision with low false positive rates and 90.9% F1-scores which is a significant improvement over other state-of-the-art methods. 

Funding acknowledgments do not always appear in articles in a standardized format. How can we automatically identify funder recognition?  Jonas Mielck from stackOcean presented “Extracting funder information from scientific papers - experiences with questions answering. The three main approaches typically used to solve this type of problem are rule-based, regular expression, and language models and machine learning. They decided to use the question-answering approach and achieved a ~0.8 accuracy.

Triet Ho Anh Doan from GWDG presented “MINE - Workspace as a Service for Text Analysis. They worked to create MINE, a search portal and a text analysis workspace for digital humanities scientists. The workspace allows users to build their own analysis workflows and run them within the workspace. 

Miria Inês Bico from the University of Lisboa presented “Early Experiments on Automatic Annotation of Portuguese Medieval Texts. They explained their early efforts to manually annotate a large text, train an automatic annotation  model, and test this model through two iterations of experimentation. The results of the second automatic annotation model showed 77.3% precision with a textual variant of the same text and 82.4% precision with a new, unseen text.

Poster Session 2

The presenters from the earlier booster session had the opportunity to display their posters during the second poster session which was held in the conference lobby.

Day 4: 2022-09-23

Session 7: Open Science

The first presentation of session 7 was by Esteban González from the Polytechnic University of Madrid. He presented their work titled FAIROs: Towards FAIR assessment in Research Objects. He discussed research objects such as datasets, software, and publications that can be utilized to model the scientific production of research. When publishing their research results, academics are increasingly using the FAIR principles as guidance but the results of scientific research are rarely published separately. He introduced FAIROs, a method for evaluating how well a Research Object adheres to the FAIR principles. They discussed the benefits and drawbacks of various scoring systems and verified FAIROs against 165 Research Objects.

Next, Patrick Hochstenbach from the Ghent University in Belgium presented his work titled Event Notification in Value-Adding Networks where he discussed the criteria for interoperability when utilizing Linked Data Notifications to exchange real-time life cycle information about web resources referred to as artifacts" (any research outputs like datasets, software, preprints, and peer-reviewed articles). He also presented a user case where they demonstrated how to leverage a national service node to distribute Scholix data-literature links to a network of institutional repositories in Belgium.

Paula Oset Garcia from the EOSC-Pillar presented her work titled Developing the EOSC-Pillar RDM Training and Support Catalogue. She talked about the proposed web application catalog that includes operational and training resources for research data management (RDM) and other FAIR and open science actors. She also mentioned the challenges we currently experience, such as metadata standards, curation, and quality control.

The next presentation was by Andrea Mannocci from the Institute of Information Science and Technologies (IIST), Italian National Research Council (CNR) on their work titled Knock knock! Who’s there? A study on long-term availability of scholarly publications. He discussed how scholarly repositories are quite dynamic and can be often updated, moved, merged, or discontinued, making them like any other web resource which is prone to link rot over time. According to data that was extracted from four well-known scholarly registers and over 13k unique repository URLs, they found that one out of every four repositories registered in scholarly registries is inaccessible.

To wrap up the session, Ivan Heibi from the University of Bologna presented Enabling Portability and Reusability of Open Science Infrastructures. As implied by the title, the main topic of his presentation was how to create an open science infrastructure that is distributed and containerized to make it simpler for it to be reused, replicated, and portable in different environments. He discussed their methodology's four key steps: analysis, design, definition, and managing and provisioning accompanied by examples of potential applications on OpenCitations

Session 8: NLP and Recommendation

Session 8 began with a presentation by Mónica Marrero from the Europeana Foundation titled “Implementation and Evaluation of a Multilingual Search Pilot in the Europeana Digital Library”. She discussed how the design and implementation of a multilingual information retrieval system based on the translation of queries and metadata to English is part of the strategy for the improvement of multilingual experiences in Europeana (a digital library that aggregates content from libraries, archives, and museums from all over Europe). In order to surface results that contain English metadata linked with them, their work tests query translation from Spanish to English for the website's Spanish-language version. 

Eman Abdelrahman from Virginia Tech presented her work titled Improving Accessibility to Arabic ETDs Using Automatic Classification. She talked about how they used data from the AskZad Digital Library to collect key metadata from Arabic Electronic Theses and Dissertations (ETDs). She also discussed the use of several machine learning and deep learning approaches for automatic subject classification of those ETDs.

The next presentation was by Elias Entrap from the Leibniz Information Centre for Science and Technology. He presented his work titled “B!SON: A Tool for Open Access Journal Recommendation”. It is a web-based journal recommendation system that can be used to recommend the most applicable open-access journals based on the title, abstract, keywords, and references provided by the user.  He pointed out that as more open-access journals are becoming available, it is harder to locate the best venue for publishing research findings.

Saber Zerhoudi from the University of Passau was the next, presenting Simulating User Querying Behavior using Embedding Space Alignment. When user interaction data is inadequate, simulation is utilized as an experiment to provide Information Retrieval (IR) systems and digital libraries with more realistic directives. Through the course of his talk, he addressed the questions of whether we can explore embedded alignment approaches to simulate user querying behavior and to what extent simulated query search sessions can replace or complement sample-based ones.  

The session concluded with the presentation titled Automation Generation of Coherent Image Galleries in Virtual Reality by Florian Spiess from the University of Basel. He discussed the rapidly growing size of multimedia collections in archives and museums and the significance of making such vast collections not just available but also accessible. He presented their suggestion to use Self-Organizing Maps (SOMs) to automatically create coherent image galleries, enabling intuitive, user-driven exploration of massive amounts of multimedia collections in virtual reality (VR). More than 300 people participated in a successful pilot test of this proposed system at the Basel Historical Museum.

Session 9: Research and CH Data

Due to the vast nature of the Web, selecting a representative sample of web pages for research is a difficult task. Many researchers use Alexa Top 1 million Sites and other top websites lists in their research. Tom Alby from Humboldt University of Berlin presented “Analyzing the Web: Are Top Websites Lists a Good Choice for Research? They found that top sites lists miss frequently visited websites. As a result, they created a heuristic-driven alternative based on the Common Crawl. 

Document Object Identifiers (DOIs) are intended to be persistent identifiers; however, they are not always persistent. They refer to broken links to DOIs as deleted DOIs. Jiro Kikkawa from University of Tsukuba presented “Analysis of the deletions of DOIs: What factors undermine their persistence and to what extent? They investigated the number and content of deleted DOIs and provided guidance for avoiding deleted DOIs and making DOIs more stable. They found over 708,000 DOIs that existed in March 2017 and did not exist in January 2021. They also found that typos and incorrect formatting had an impact on the appearance of deleted DOIs.

Robert B. Allen presented “Implementation Issues for a Highly Structured Research Report He explained that research reports contain structured knowledge. He presented the application of this framework to Pasteur’s swan-neck flask experiment and the challenges they have faced. Overall, this study is a step towards direct representation of research reports and a part of ongoing work. 

Chiari Mannari from University of Pisa presented “PH-remix Prototype: A non-relational approach for exploring AI-generated content in audiovisual archives. They created PH-remix,  a prototype platform containing a film archive, AI extraction, and a remix application. They leverage AI techniques for searching content in audiovisual archives. Their presentation also included a demonstration of the functionality of PH-remix.

To wrap up TPDL 2022, Hermann Kroll from the Institute for Information Systems at TU Braunschweig presented “On Dimensions of Plausibility for Narrative Information Access to Digital Libraries. Narratives allow us to communicate information in a sequence so someone else can follow our thought and line of thinking.  In narrative information access, there is a need to bind the narrative to real world data and ensure the bindings are context-compatible. This presentation dug into the challenges of determining the plausibility of the narrative and proposed a set of dimensions that need to be considered in narrative information access.

Closing and TPDL 2023

Following the Research and CH Data session, TPDL 2022 came to an end. The next TPDL will be held at the University of Zadar, Croatia. This was our first time attending an in-person academic conference. We had the opportunity to present our research in front of a live audience and meet a number of academic experts from around the globe. Overall, the TPDL 2022 conference in Padua, Italy, was a great experience and it was an honor to represent the WS-DL research group!

- Himarsha Jayanetti (@HimarshaJ) and Emily Escamilla (@EmilyEscamilla_)