2023-10-03: International Conference on Document Analysis and Recognition (ICDAR) 2023 Trip Report

 




The 2023 International Conference on Document Analysis and Recognition (ICDAR 2023) was held in person from August 21 to 26, 2023, with the requirement for all authors to personally present their papers at the conference. Exceptions were made only in cases where authors faced unavoidable circumstances that prevented their attendance. In such instances, synchronous video presentations were organized to facilitate their oral sessions. For authors unable to attend, their posters were featured through prerecorded teaser videos in addition to physical poster displays. The main conference, which took place from August 21 to 23, 2023, featured three keynote talks by distinguished speakers: Marti Hearst from UC Berkeley, Vlad Morariu from Adobe Research, and Seichi Uchida from Kyushu University. During the conference, I attended all the sessions, representing the LAMP-SYS lab of the Web Science and Digital Library Research group (WS-DL).

Day 1 (Main Conference)

Day 1 of the main conference commenced with paper sessions, which were chaired by Dr. Richard Zanibbi (Oral Session 1) and Dr. Rajiv Jain (Oral Session 2). The paper presentations in Oral Session 1 began with a presentation by Juan C. Martinez from the University of Alicante, Spain,  titled "A Holistic Approach for Aligned Music and Lyrics Transcription". This work aims to address the automated transcription of musical notes and lyrics from music scores while ensuring precise alignment between these two types of information.


Following that, the same team from the University of Alicante presented their work titled "End-to_end Optical Music Recognition for Pianoform Sheet Music". This research delves into the complexities and resolutions within Optical Music Recognition (OMR), a field dedicated to transforming images of musical scores into symbolic representations. Their study introduces a neural method explicitly crafted for the end-to-end transcription of such scores. Moreover, they introduce the GRANDSTAFF dataset, comprising a substantial collection of single-system piano scores. The proposed method is trained and evaluated using this dataset, demonstrating its effectiveness in transcribing pianoform notation.


Another interesting paper, by Haoyang Shen from the Xidian University, China titled "A multi-level synthesis strategy for online handwritten chemical equation recognition", addresses the task of recognizing handwritten chemical equations, a challenge exacerbated by the scarcity of accessible datasets. In response, the authors present a method for synthesizing handwritten equations from LaTeX expressions, framing the recognition task as the conversion of images to markup. Their approach entails dissecting LaTeX expressions into a symbol layout tree and generating components while accommodating handwriting conventions. Furthermore, they augment local and global expression patterns to enrich the diversity of the synthesized data.

 

The first part of Session 1 concluded with a presentation by Filip Darmanovic on "SCI-3000: A Dataset for Figure, Table and Caption Extraction from Scientific PDFs". The authors introduce the SCI-3000 dataset, comprising 3,000 scientific publications in PDF format, totaling 34,791 pages, spanning various fields including computer science, biomedicine, chemistry, physics, and technology. These PDFs are annotated with information regarding figures, tables, and their corresponding captions. This dataset serves as a valuable resource for evaluating two extraction approaches: rule-based and. deep learning-based methods, for benchmarking such methods in scientific document analysis.


Keynote 1

The first keynote speech at the ICDAR 2023 conference titled "Bringing Scientific Papers to Life" was given by Dr. Marti Hearst from the UC Berkeley, School of Information. In her talk, Dr. Hearst mentioned that "publications are growing exponentially and papers can be challenging to understand". She also discussed the approaches her lab has implemented to make the knowledge in scholarly papers more widely and broadly accessible. These include the use of NLP, HCI, and document analysis to augment papers.

 


Following the lunch break, we presented our paper titled  "A Study on Reproducibility and Replicability of Table Structure Recognition Methods" by myself and Muntabir Hasan Choudhury, Sarah Rajmajer, and Jian Wu.  Our paper discusses concerns about reproducibility and replicability in the field of artificial intelligence, particularly in the context of table structure recognition (TSR). The study examines 16 papers on TSR and attempts to reproduce their results using the provided codes and datasets. Additionally, it assesses replicability by using a dataset similar to the original dataset and a new dataset called GenTSR, which contains annotated tables from scientific papers. The findings indicate that out of the 16 papers, only four could be successfully reproduced, with main results consistent with the reported results. Among these, two papers were identified as replicable when using a similar dataset under certain Intersection over Union values, which measures overlap between ground truth and predicted bounding boxes of table cells. However, none of the papers were identified as replicable when using the new dataset. The paper offers insights into the reasons for irreproducibility and irreplicability in AI research, shedding light on challenges related to the consistency and reliability of research findings in this field.  For more details about this paper, refer to our previous blog. In addition, our paper can be accessed via this link.


Another interesting paper within the realm of table understanding, titled "An End-to-End Local Attention Based Model for Table Recognition", was presented by by Nam Tuan Ly. This paper delves into the challenges confronted by Transformer-based models when dealing with large tables, mainly due to the constraints of the global attention mechanism. To tackle this issue, the authors put forth a local attention mechanism. Furthermore, they introduce a comprehensive model designed for the recognition of both the structure and content of tables present in images. This model comprises an encoder for feature extraction and three decoders, each dedicated to table structure recognition, cell detection, and cell-content recognition, respectively. The experimental phase of their study made use of the FinTabNet and PubTabNet datasets.


  

The final presentation during Session 3 on Day 1 of ICDAR 2023 was titled"Generalization of Fine Granular Extractions from Charts", delivered by Shubham Singh Paliwal from TCS Research, India. This research introduces an approach centered on attention and dynamic filtering for extracting chart elements and discerning text-role regions. Their method attains state-of-the-art results on the PlotQA dataset, surpassing current approaches with a remarkable 2.81% improvement in mean average precision (mAP) at an intersection over union (IOU) of 0.90.


Day 2 

Day 2 of ICDAR 2023 commenced with a paper presentation titled "Multi-Stage Fine-tuning Deep Learning Models Improves Automatic Assessment of the Rey-Osterrieth Complex Figure Test", delivered by Benjamin Schuster from the University Hospital Cologne, Germany. This paper introduces the Rey-Osterrieth Complex Figure Test (ROCFT), a neuropsychological assessment tool employed in the evaluation of various diseases. The test involves patients replicating a complex illustration and subsequently recalling it from memory after specific time intervals. Typically, human raters assess these reproductions, with the overall score indicating the severity of the illness. Furthermore, the paper discusses existing algorithms designed to automate the manual evaluation process, which often necessitates extensive private datasets, posing challenges in training deep learning models. In response, the authors tackle this challenge by devising a multi-stage fine-tuning strategy. They demonstrate that pre-training on a large-scale sketch dataset with initialized weights from ImageNet significantly diminishes the mean absolute error in comparison to training with ImageNet weights alone.



Following that, Sagar Chakraborty presented "TransDocAnalyser: A framework for semi-structured offline handwritten documents analysis with an application to legal domain". Their paper introduces a novel dataset known as the FIR dataset, which contains First Information Report documents from Indian police stations. The authors propose an end-to-end OCR framework employing an Encoder-Decoder architecture that combines Faster-RCNN and Vision Transformers, along with a domain-specific tokenizer. Additionally, they propose a post-correction technique to address recognition errors. According to the authors, this framework achieves exceptional results on the FIR dataset, surpassing existing models and establishing a new state-of-the-art.


Before the poster sessions, a noteworthy paper titled "Line Extraction in Handwritten Documents via Instance Segmentation", was presented by Jason Wells on behalf of Adeela Islam from the University of Punjab, Pakistan. This paper introduces an approach based on deep learning for extracting text lines from handwritten documents. They approached the task by treating lines as objects using object detection and segmentation frameworks, providing flexibility in addressing variations in spacing, skew, and layouts.



Day 3 

Day 3 of ICDAR 2023 commenced with a presentation by Yejing Xie from the Nantes Universite, France introducing the 7th edition of the ICDAR 2023 CROHME titled "ICDAR 2023 CROHME: Competition on Recognition of Handwritten Mathematical Expressions". In their paper, they outlined the tasks assigned to the participants, comprising three tasks with different modalities (on-line, off-line, and bimodal). They collected 3,905 new handwritten equations for training and evaluation purposes. Notably, this competition enabled the comparison of on-line and off-line systems on the same test set and saw participation from six teams, with one team achieving over 80% expression recognition rate in all three tasks.


The following competition presentation was delivered by Shangbang Long from the Google Research team on "ICDAR 2023 Competition on Hierarchical Text Detection and Recognition".  In their paper, they provided an overview of the competition's organization, encompassing tasks, datasets, evaluations, and scheduling details. The competition received over 50 submissions from more than 20 participating teams, underscoring its success. Additionally, their paper presented the competition results and offered valuable insights.


Another interesting competition presentation was titled "ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition", presented by George Tom from the Center for Visual Information Technology, India. This competition focused on evaluating and improving methods for detecting, tracking, and recognizing text in dash cam videos using the RoadText-1K dataset, which contains 1000 annotated videos. The paper presented a thorough examination of the methods submitted by participants and an insightful analysis of their outcomes, shedding light on the potential and constraints of video text analysis for dash cam videos.


Closing Ceremony 

The closing ceremony of the ICDAR 2023 started with an award ceremony where the Best Poster, Best Paper, and Best Student Paper awards were announced.

  • Best Poster Award: "Computer Vision Techniques for Handwritten Optical Music Recognition" by Pau Torras.
  • Best Student Paper Award: "Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation" by Michael Jungo, Beat Wolf, Andrii Maksai, Claudiu Musat, and Andreas Fischer.
  • Best Paper Award: "A Hybrid Model for Multilingual OCR" by David Etter, Cameron Carpenter, and Nolan King

Finally, ICDAR 2023 ended by announcing its next conference date (September 6-11, 2024), and it will be held in Athens, Greece.

Wrap-up

Attending the ICDAR conference for the first time was an exciting milestone in my academic journey. It was a great privilege to share my research with a live audience and engage in meaningful discussions with experts from around the world. The ICDAR 2023 conference held in San Jose, California, offered a fantastic experience, and I felt honored to represent the WS-DL research group on this global platform.

- Ajayi Kehinde Peter (@AjayiKehindep

Comments