2021-09-15: The 21st ACM Symposium on Document Engineering (DocEng 2021) Trip Report
The 21st ACM Conference on Document Engineering (DocEng 2021) was organized by the University of Limerick, Ireland and held virtually (due to the COVID-19 pandemic) from August 24th to 27th. This year, the conference received a total of 24 full paper submissions and 41 short paper submissions, from which 8 full papers (33%) and 20 short papers (49%) were accepted. This year's conference was sponsored by the ACM SIGWEB (ACM Special Interest Group on Hypertext and the Web) in co-operation with the ACM SIGDOC (ACM Special Interest Group on Design of Communication). The general chairs of DocEng 2021 were Dr. Patrick Healy and Dr. Mihai Bilauca of the University of Limerick, Ireland, and the program chair was Dr. Alexandra Bonnici of the University of Malta, Malta.
@ACMDocEng 2021 is now happening!
— Yasith Jayawardana (@yasithmilinda) August 25, 2021
This year, the conference is held virtually and hosted by the University of Limerick, Ireland. #DocEng2021 pic.twitter.com/N5ELEX4voO
I submitted two short papers to this year's conference, from which one was accepted. This paper, titled "Metadata-Driven Eye Tracking for Real-Time Applications" was co-authored by Gavindya Jayawardena and Dr. Sampath Jayarathna from Old Dominion University, USA, and Dr. Andrew T. Duchowski from Clemson University, USA.
Day 1
The first day of the conference was dedicated to two tutorial sessions. One tutorial session, "Domain-specific Modelling in Document Engineering", was conducted by Dr. Verislav Djukic from Djukic Software GmbH, Germany, and Dr. Juha-Pekka Tolvanen from Metacase, Finland. The other tutorial session, "Document Engineering Issues in Malware Analysis", was conducted by Dr. Charles Nicholas and Dr. Robert Joyce from the University of Maryland Baltimore County, USA, and Dr. Steven Simske from the Colorado State University, USA.
Day 2
Keynote I
The second day of the conference started off with the keynote "Searching Harsh Documents" by Dr. Ophir Frieder from Georgetown University, USA. Dr. Frieder focuses on scalable health information processing systems. He is a member of the Computer Science faculty at the Georgetown University and the biostatistics, bioinformatics and biomathematics faculty in the Georgetown University Medical Centre. He is also the Lead Science and Technology Advisor for Aurora: The Business Forge and the Chief Scientific Officer of Invaryant, Inc.
In this keynote, Dr. Frieder delved into the challenges of searching "harsh" document collections, such as natively non-digital documents, multilingual documents, documents that include non-textual components, corrupted documents, or any combination thereof. He discussed several techniques for image binarization, how to cope with poor spelling, and handling standard and non-standard documents. He also discussed existing solutions for different aspects of machine readability, and the importance of integrating them together for better machine readability.
Before ending the keynote, Dr. Frieder discussed ongoing research on engineering chat-bots to mimic certain individuals, by analyzing their social data, such as writings, conversations, and images. This raised the question of what adverse effects that may arise from doing so.
DocEng2021 Keynote 1 - Curated tweets by yasithmilindaSession 1 - Document Content Analysis
Dr. Frieder's keynote was followed by the first paper session of the day, "Document Content Analysis", chaired by Dr. Bessat Kassaie from University of Waterloo, Canada. This session consisted of 5 papers.
- Md. Rashadul Hasan Rakib from Dalhousie University, Canada presented their paper "Efficient Clustering of Short Text Streams using Online-Offline Clustering".
- Johannes Knittel from University of Stuttgart, Germany presented their paper "Efficient Sparse Spherical k-Means for Document Clustering".
- Marcel Schaeben from Cologne Center for eHumanities, Germany, and Gioele Barabucci from Norwegian University of Science and Technology, Norway presented their paper "Small-step Pipelines Reduce the Complexity of XSLT/XPath Programs".
- Fatemeh Rahimi from Dalhousie University, Canada presented their paper "MTLV: A Library for Building Deep Multi-task Learning Architectures".
- Johannes Knittel from University of Stuttgart, Germany presented their paper "ELSKE: Efficient Large-Scale Keyphrase Extraction".
Session 2 - Generation, Manipulation, and Presentation
The second paper session of the day, "Generation, Manipulation, and Presentation" was chaired by Dr. Steven Bagley from the University of Nottingham, UK, and consisted of 3 papers.
- Rémi Calizzano from DFKI GmbH, Germany presented their paper "Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles".
- Athar Sefid from Pennsylvania State University, USA presented their paper "SlideGen: An Abstractive Section-Based Slide Generator for Scholarly Documents".
- Kevin Fenton from Colorado State University, USA presented their paper "Engineering of An Artificial Intelligence Safety Data Sheet Document Processing System for Environmental, Health, and Safety Compliance".
Day 3
Keynote II
Day 3 of DocEng 2021 started off with the keynote "20 Years of Physical Document and Product Protection Using Digital Methods" by Dr. Justin Picard. Dr. Picard is the Chief Technology Officer (CTO) of Scantrust SA, Switzerland, a product authentication and traceability company that he co-founded in 2013, and the inventor of the copy detection pattern, a digital authentication technology for detecting product and document counterfeiting. He is also a co-founder of the NGO Black Market Watch, where he developed a methodology to assess the impacts of illicit trade.
In this keynote, Dr. Picard reviewed some of the counterfeit detection techniques developed in the last 20 years, including printed digital watermarks, copy detection patterns and secure QR Codes. He explained how these technologies allow users to verify the authenticity of digitized content with their smartphones, and in turn, help to digitize physical documents, packaging and products. He provided real world examples on counterfeit detection from industrial applications, and discussed some of the current research problems in this area.
Session 3 - Security and Sensitive Documents
Dr. Picard's keynote was followed by the first paper session of the day, "Security and Sensitive Documents". This session was chaired by Dr. Charles Nicholas from the University of Maryland Baltimore County, USA, and consisted of 4 papers.
- Fabian Singhofer from University of Ulm, Germany presented their paper "A Novel Approach on the Joined De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm".
- Andre Tabone from University of Malta, Malta presented their paper "Pornographic Content Classification Using Deep-Learning".
- Justin Picard from Scantrust SA, Switzerland presented their paper "Counterfeit Detection with QR Codes".
- Francisco Jáñez-Martino from University of León, Spain presented their paper "Trustworthiness of Spam Email Addresses Using Machine Learning".
Session 4 - Applications and User Experiences
The next session was "Applications and User Experiences", chaired by Dr. Dick Bulterman from the Centrum Wiskunde & Informatica (CWI), Netherlands. This session consisted of 5 papers.
- Ajit Jain from Texas A&M University, USA presented their paper "Recognizing Creative Visual Design: Multiscale Design Characteristics in Free-Form Web Curation Documents".
- Odunayo Ogundepo from University of Waterloo, Canada presented their paper "Rescuing Historical Climate Observations to Support Hydrological Research: A Case Study of Solar Radiation Data".
- Rajkumar Ramamurthy from Fraunhofer IAIS, Germany presented their paper "ALiBERT - Improved Automated List Inspection (ALI) with BERT".
- Soundarya Nurani Sundareswara from Pennsylvania State University, USA presented their paper "A Large-Scale Exploration of Terms of Service Documents on the Web".
- I (Yasith Jayawardana from Old Dominion University, USA) presented our paper "Metadata-Driven Eye Tracking for Real-Time Applications".
The goal of our paper "Metadata-Driven Eye Tracking for Real-Time Applications" was to demonstrate the benefits of adapting FAIR metadata standards to collect data, build workflows, and validate results in eye tracking research. Here, we introduced the issues encountered when conducting eye tracking research in the vast landscape of proprietary and vendor-specific eye tracking software, and proposed an approach to workaround these issues. In this approach, we first use our DFS metadata format to describe eye trackers and datasets collected using them. Next, we use this metadata to "replay" datasets and effectively simulate real-time data streams.
To verify that this approach indeed works, we created DFS metadata for two eye trackers (Tobii Pro X2-60 and SR Research EyeLink-1000) and two datasets collected using them (ADHD-SIN and N-BACK), replayed the datasets using this metadata, and conducted several real-time eye movement analysis and synthesis tasks using the replayed data. Based on our results, we discussed how well this approach works, and how it could be generalized beyond eye tracking applications.
DocEng2021 S4 (Our Paper) - Curated tweets by yasithmilinda
ACM Town Hall
The next session was a discussion on the ACM Special Interest Group on Hypertext and the Web (SIGWEB). This session, named "ACM Town Hall", was conducted by Dr. Peter Brusilovsky of the University of Pittsburgh, USA. In this session, Dr. Brusilovsky introduced what SIGWEB is, the four major conferences of SIGWEB: 1) ACM Conference on Hypertext and Social Media, 2) ACM Symposium on Document Engineering (i.e., this conference), 3) ACM Web Science Conference, and 4) The Web Conference, and the reason why one should become a SIGWEB member.
In the ACM Town Hall of #DocEng2021, Peter Brusilovsky of @PittTweet explained about SIGWEB. pic.twitter.com/tb5T8549Lz
— Yasith Jayawardana (@yasithmilinda) August 26, 2021
Day 4
The 4th and the last day of the conference began with an information session on the upcoming ACM DocEng 2022, by Dr. Matthew Hardy, the Director of Engineering at Adobe. Dr. Hardy mentioned that the next DocEng conference will be hosted by Adobe, at the Adobe Campus in San Jose, California, USA. He mentioned that this conference would hopefully be in-person, and not virtual.
Day 4 of @ACMDocEng 2021 starting off with information on @ACMDocEng 2022 provided by Matthew Hardy.
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
- @ACMDocEng 2022 will be in Adobe San Jose Campus, California
- Host: @Adobe
- Probably an in-person conference
- Dates are not fixed yet#DocEng2021 #DocEng2022 pic.twitter.com/GEeXCpy4WP
Session 5 - Systems for Visual Document Analysis
The session by Dr. Hardy was followed by first paper session of the day, "Systems for Visual Document Analysis". It was chaired by Dr. Tamir Hassan from Round-Trip PDF Solutions Vienna, Austria, and consisted of 7 papers.
- Manabu Ohta from Okayama University, Japan presented their paper "Table-structure Recognition Method Using Neural Networks for Implicit Ruled Line Estimation and Cell Estimation".
- Lucas Kirsten from HP Inc. R&D, Brazil presented their paper "Evaluating Deep Neural Networks for Image Document Enhancement".
- Shrey Mishra from Inria Research Center of PSL University, France presented their paper "Towards Extraction of Theorems and Proofs in Scholarly Articles".
- Daniela Costa from Federal University of Pernambuco, Brazil presented their paper "A Comparative Study on Methods and Tools for Handwritten Mathematical Expression Recognition".
- After a short break, Raid Saabni from The Academic College of Tel-Aviv Yaffo, Israel presented their paper "Text Line Extraction Using Deep Learning and Minimal Sub Seams".
- Rafael Lins from Rural Federal University of Pernambuco, Brazil presented their paper "Direct Binarisation A Quality-and-Time Efficient Binarisation Strategy".
- Jennil Thiyam from Indian Institute of Technology, Guwahati presented their paper "Challenges in Chart Image Classification: A Comparative Study of Different Deep Learning Methods".
Session 6 - Collections, Systems, and Management
The next session, "Collections, Systems, and Management", was the last paper session of the conference. It was chaired by Dr. Angelo Di Iorio from the University of Bologna, Italy and consisted of 4 papers.
- Eugene Yang from Georgetown University, USA presented their paper "On Minimizing Cost in Legal Document Review Workflows"
- Eugene Yang also presented their paper "Heuristic Stopping Rules For Technology-Assisted Review"
- Maxime Cauz from University of Namur, Belgium presented their paper "Shock Wave: a Graph Layout Algorithm for Text Analyzing"
- Maksim Eren, Nick Solovyev, Chris Hamer, and Charles Nicholas from University of Maryland Baltimore County, USA presented their paper "COVID-19 Multidimensional Kaggle Literature Organization"
Binarization Challenge Summary
At the end of the last paper session of the conference, Dr. Steven Simske from the Colorado State University, USA presented the results of the Binarization Challenge, "Time-Quality Competition on Binarizing Photographed Documents". The competition was carried out using four smartphone cameras. Images taken from each smartphone were first binarized using state-of-the-art algorithms, and then compared using the proportion of black-to-white ratio between the binarized image and the ground truth image, and Levenshtein distance between the OCR-extracted text (via Google Cloud Vision) and the ground truth text.
Dr. Steve Simske from @ColoradoStateU presenting the "Binarisation challenge summary" #DocEng2021 pic.twitter.com/zlP8op6UHq
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
Awards
The binarization challenge summary was followed by the DocEng 2021 Awards Ceremony. Here, the best student paper award was given to the paper "On Minimizing Cost in Legal Document Review Workflows" by Eugene Yang et al.
Congratulations to the @ACMDocEng 2021 Best Student Paper Award winners, Eugene Yang et al. on their paper titled "On Minimizing Cost in Legal Document Review Workflows"! #DocEng2021 pic.twitter.com/9BzDhbVBLv
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
The best paper award was given to the paper "A Novel Approach on the Joined De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm" by Fabian Singhofer et al.
Congratulations to the @ACMDocEng 2021 Best Paper Award winners, Fabian Singhofer et al. on their paper titled "A Novel Approach on the Joined De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm"! #DocEng2021 pic.twitter.com/y0f2xiyK3a
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
Birds of a Feather
The awards ceremony was followed by the Birds Of a Feather (BOaF) presentation session. While this session originally targeted two presentations, only one was presented. Following this, Dr. Charles Nicholas from the University of Maryland Baltimore County, USA presented "How to get a book published in Document Engineering".
Dr. Charles Nicholas from @UMBC presenting the Birds of a Feather presentation, under the topic "How to get a book published in Document Engineering".#DocEng2021 pic.twitter.com/DAD6IzD0wD
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
Following this, Dr. Steve Simske, Dr. Alexandra Bonnici, Dr. Paddy Healy, and Dr. Mihai Bilauca delivered the closing remarks of DocEng 2021, and with this, the DocEng 2021 conference came to its end.
Steve Simske, Alexandra Bonnici, Paddy Healy, and Mihai Bilauca delivering the Closing remarks of @ACMDocEng 2021.
— Yasith Jayawardana (@yasithmilinda) August 27, 2021
Thank you and the entire organizing committee and all participants for marking #DocEng2021 a success! pic.twitter.com/m45rKlwDgF
-- Yasith Jayawardana (@yasithmilinda)
Comments
Post a Comment