2023-10-03: International Conference on Document Analysis and Recognition (ICDAR) 2023 Trip Report
Day 1 (Main Conference)
Day 1: Juan C. Martinez-Sevilla is presenting their paper on "A Holistic Approach for Aligned Music and Lyrics Transcription" at #ICDAR2023. pic.twitter.com/inV4UyOSxQ
— Ajayi Kehinde Peter (@AjayiKehindep) August 21, 2023
The next paper in session 1 #ICDAR2023 is presented by Anthonio Rios-Vila from the University of Alicante, Spain. He is presenting their paper on "End-to-end Optical Music Recognition for Pianoform Sheet Music"@icdar2023 pic.twitter.com/dhiwkblw0z
— Ajayi Kehinde Peter (@AjayiKehindep) August 21, 2023
Excited to share that our team is presenting two Optical Music Recognition papers at this year's International Conference on Document Analysis and Recognition #ICDAR2023
— Jorge Calvo-Zaragoza (@jcalvozaragoza) August 21, 2023
Another interesting paper, by Haoyang Shen from the Xidian University, China titled "A multi-level synthesis strategy for online handwritten chemical equation recognition", addresses the task of recognizing handwritten chemical equations, a challenge exacerbated by the scarcity of accessible datasets. In response, the authors present a method for synthesizing handwritten equations from LaTeX expressions, framing the recognition task as the conversion of images to markup. Their approach entails dissecting LaTeX expressions into a symbol layout tree and generating components while accommodating handwriting conventions. Furthermore, they augment local and global expression patterns to enrich the diversity of the synthesized data.
Their method consists of 3 steps which are: Latex Decomposition, Symbol Layout, and Data Augmentation.@icdar2023 pic.twitter.com/IbxyvhikdZ
— Ajayi Kehinde Peter (@AjayiKehindep) August 21, 2023
The first part of Session 1 concluded with a presentation by Filip Darmanovic on "SCI-3000: A Dataset for Figure, Table and Caption Extraction from Scientific PDFs". The authors introduce the SCI-3000 dataset, comprising 3,000 scientific publications in PDF format, totaling 34,791 pages, spanning various fields including computer science, biomedicine, chemistry, physics, and technology. These PDFs are annotated with information regarding figures, tables, and their corresponding captions. This dataset serves as a valuable resource for evaluating two extraction approaches: rule-based and. deep learning-based methods, for benchmarking such methods in scientific document analysis.
The last paper presentation for the first session #ICDAR2023 is given by Filip Darmanović on "SCI-3000". In this work, he proposed a novel dataset for the task of figure, table, and caption extraction from scientific PDFs.@icdar2023 @WebSciDL pic.twitter.com/JpW4EoaRaK
— Ajayi Kehinde Peter (@AjayiKehindep) August 21, 2023
Keynote 1
The second half of Day 1 of #ICDAR2023 starts with a keynote speech by Dr. Marti Hearst, a Professor and Head of School at UC Berkeley, School of Information and the Computer Science Division.@icdar2023 @WebSciDL pic.twitter.com/OzX0wnG9Te
— Ajayi Kehinde Peter (@AjayiKehindep) August 21, 2023
Marti Hearst keynotes #ICDAR2023 on Papers that Come to Life pic.twitter.com/RimRDXy82w
— Alexy 🤍💙🤍 (@ChiefScientist) August 21, 2023
Following the lunch break, we presented our paper titled "A Study on Reproducibility and Replicability of Table Structure Recognition Methods" by myself and Muntabir Hasan Choudhury, Sarah Rajmajer, and Jian Wu. Our paper discusses concerns about reproducibility and replicability in the field of artificial intelligence, particularly in the context of table structure recognition (TSR). The study examines 16 papers on TSR and attempts to reproduce their results using the provided codes and datasets. Additionally, it assesses replicability by using a dataset similar to the original dataset and a new dataset called GenTSR, which contains annotated tables from scientific papers. The findings indicate that out of the 16 papers, only four could be successfully reproduced, with main results consistent with the reported results. Among these, two papers were identified as replicable when using a similar dataset under certain Intersection over Union values, which measures overlap between ground truth and predicted bounding boxes of table cells. However, none of the papers were identified as replicable when using the new dataset. The paper offers insights into the reasons for irreproducibility and irreplicability in AI research, shedding light on challenges related to the consistency and reliability of research findings in this field. For more details about this paper, refer to our previous blog. In addition, our paper can be accessed via this link.
For more details about the reproducibility and replicability of the Table Structure Recognition Methods described in this blog, please check out this explainer thread:https://t.co/SRMIRI9QKo@icdar2023
— Ajayi Kehinde Peter (@AjayiKehindep) August 10, 2023
Another interesting paper within the realm of table understanding, titled "An End-to-End Local Attention Based Model for Table Recognition", was presented by by Nam Tuan Ly. This paper delves into the challenges confronted by Transformer-based models when dealing with large tables, mainly due to the constraints of the global attention mechanism. To tackle this issue, the authors put forth a local attention mechanism. Furthermore, they introduce a comprehensive model designed for the recognition of both the structure and content of tables present in images. This model comprises an encoder for feature extraction and three decoders, each dedicated to table structure recognition, cell detection, and cell-content recognition, respectively. The experimental phase of their study made use of the FinTabNet and PubTabNet datasets.
Tahira Shehzadi is presenting their work titled "Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer" at #ICDAR2023.@ICDAR2023 @WebSciDL pic.twitter.com/lwry6YOg2t
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
The final presentation during Session 3 on Day 1 of ICDAR 2023 was titled"Generalization of Fine Granular Extractions from Charts", delivered by Shubham Singh Paliwal from TCS Research, India. This research introduces an approach centered on attention and dynamic filtering for extracting chart elements and discerning text-role regions. Their method attains state-of-the-art results on the PlotQA dataset, surpassing current approaches with a remarkable 2.81% improvement in mean average precision (mAP) at an intersection over union (IOU) of 0.90.
Shubham Singh Paliwal from TCS Research, India is now presenting the last paper for Day 1 of #ICDAR2023, titled "Generalization of Fine Granular Extractions from Charts".@icdar2023 @WebSciDL pic.twitter.com/xldN8fXG92
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
Day 2
The goal of their paper was to train a deep learning mode to predict ROCFT scores, but they had very limited data. Therefore, they employed data augmentation techniques.@ICDAR2023 pic.twitter.com/KkznRIMVr2
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
To fine-tune the pretrained models, they explored different strategies including training only the last layers, training whole network with small learning rate, and ensembling multiple architectures.@icdar2023 pic.twitter.com/uH4u1n20Jr
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
Following that, Sagar Chakraborty presented "TransDocAnalyser: A framework for semi-structured offline handwritten documents analysis with an application to legal domain". Their paper introduces a novel dataset known as the FIR dataset, which contains First Information Report documents from Indian police stations. The authors propose an end-to-end OCR framework employing an Encoder-Decoder architecture that combines Faster-RCNN and Vision Transformers, along with a domain-specific tokenizer. Additionally, they propose a post-correction technique to address recognition errors. According to the authors, this framework achieves exceptional results on the FIR dataset, surpassing existing models and establishing a new state-of-the-art.
Sagar Chakraborty @Wipro, India is now presenting their paper on "TransDocAnalyser: A framework for semi-structured offline handwritten documents analysis with an application to legal domain" #ICDAR2023.@icdar2023 @WebSciDL pic.twitter.com/Bpw9Vxv8iI
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
Before the poster sessions, a noteworthy paper titled "Line Extraction in Handwritten Documents via Instance Segmentation", was presented by Jason Wells on behalf of Adeela Islam from the University of Punjab, Pakistan. This paper introduces an approach based on deep learning for extracting text lines from handwritten documents. They approached the task by treating lines as objects using object detection and segmentation frameworks, providing flexibility in addressing variations in spacing, skew, and layouts.
Jason Wells is presenting their work on behalf of Adeela Islam from the University of Punjab, Pakistan on "Line Extraction in Handwritten Documents via Instance Segmentation".@icdar2023 @WebSciDL pic.twitter.com/z8JzGDboUx
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
Their approach involved a four-stage utilization of MaskRCNN, encompassing multi-scale feature extraction, identification of line regions at each scale, extraction of fixed-size convolutional features from line regions, and generation of bounding boxes and masks.@icdar2023 pic.twitter.com/kJNfQwkgKh
— Ajayi Kehinde Peter (@AjayiKehindep) August 22, 2023
Day 3
The competition comprises three primary tasks: online tasks involving images in InkML format, offline tasks utilizing scanned PNG images, and a bimodal task that combines both modes.@icdar2023 pic.twitter.com/mfZjrqME5B
— Ajayi Kehinde Peter (@AjayiKehindep) August 23, 2023
The following competition presentation was delivered by Shangbang Long from the Google Research team on "ICDAR 2023 Competition on Hierarchical Text Detection and Recognition". In their paper, they provided an overview of the competition's organization, encompassing tasks, datasets, evaluations, and scheduling details. The competition received over 50 submissions from more than 20 participating teams, underscoring its success. Additionally, their paper presented the competition results and offered valuable insights.
Shangbang Long from the @GoogleResearch is now presenting their competition titled "ICDAR 2023 Competition on Hierarchical Text Detection and Recognition"@icdar2023 @WebSciDL pic.twitter.com/fwPoYaVM5u
— Ajayi Kehinde Peter (@AjayiKehindep) August 23, 2023
Another interesting competition presentation was titled "ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition", presented by George Tom from the Center for Visual Information Technology, India. This competition focused on evaluating and improving methods for detecting, tracking, and recognizing text in dash cam videos using the RoadText-1K dataset, which contains 1000 annotated videos. The paper presented a thorough examination of the methods submitted by participants and an insightful analysis of their outcomes, shedding light on the potential and constraints of video text analysis for dash cam videos.
George Tom from the Center for Visual Information Technology, India is now presenting their competition on "ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition" at #ICDAR2023.@ICDAR2023 @WebSciDL pic.twitter.com/keisByXDTc
— Ajayi Kehinde Peter (@AjayiKehindep) August 23, 2023
Closing Ceremony
- Best Poster Award: "Computer Vision Techniques for Handwritten Optical Music Recognition" by Pau Torras.
- Best Student Paper Award: "Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation" by Michael Jungo, Beat Wolf, Andrii Maksai, Claudiu Musat, and Andreas Fischer.
- Best Paper Award: "A Hybrid Model for Multilingual OCR" by David Etter, Cameron Carpenter, and Nolan King
✈️#CVCmoves! our researchers @sanket10rony, Sergi Garcia, @RostarFreeman & @JosepLlados presented their work on #DocumentAnalysis at #ICDAR2023!
— CVC_UAB (@CVC_UAB) August 30, 2023
Congratulations to all authors and, especially, to Pau Torras (@RostarFreeman) for his Best Poster Award at the Doctoral Consortium🏆 pic.twitter.com/4qtzMJoXoq
Finally, ICDAR 2023 ended by announcing its next conference date (September 6-11, 2024), and it will be held in Athens, Greece.
Wrap-up
Attending the ICDAR conference for the first time was an exciting milestone in my academic journey. It was a great privilege to share my research with a live audience and engage in meaningful discussions with experts from around the world. The ICDAR 2023 conference held in San Jose, California, offered a fantastic experience, and I felt honored to represent the WS-DL research group on this global platform.- Ajayi Kehinde Peter (@AjayiKehindep)
Comments
Post a Comment