2025-02-25: A Rollercoaster of Deadlines, Discoveries, and Late-Night Snacks: A Reflection on My Ph.D. Journey

It felt like yesterday when I first enrolled in Old Dominion University (ODU) to pursue a Ph.D. in Computer Science in 2019 (Muntabir's introduction). Fortunately, I was accepted to the Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS), part of the Web Science and Digital Libraries Research Group (WSDL) in the Department of Computer Science at ODU. LAMP-SYS is led by Dr. Jian Wu and focuses on building intelligent systems that solve real-world problems using natural language processing (NLP), computer vision (CV), machine learning (ML), and deep learning (DL) techniques.

I was overwhelmed with joy on the day when I had my first meeting with my professor regarding project briefs and discussions. I was assigned to a collaborated project with Virginia Tech called "Mining Electronic Theses and Dissertations (ETDs)," supported by the Institute of Museum and Library Services (IMLS). ETDs are scholarly articles that usually serve as partial requirements of academic degrees for students pursuing higher education. These ETDs are hosted by commercial (e.g., ProQuest) or university digital library repositories. However, the digital libraries of ETDs lack computational models and services for accessing and discovering the knowledge buried in ETDs. For example, many library-provided metadata often exhibit incomplete, inconsistent, and incorrect values, which harms the discoverability of ETDs. Additionally, ETDs can be scanned (scanning physical copies of thesis and dissertation) and born-digital. To index quality metadata, we first need to extract the metadata accurately from both document types, and it was one of the key research challenges I explored at the beginning of my Ph.D. journey. I briefly explained some of my earlier research in the following blogs: Optical Character Recognition (OCR) Experiment, and Heuristic Rules to Extract Metadata.

Research & Publications

In an era where information is abundant yet often inaccessible, efficiently retrieving and utilizing scholarly knowledge is crucial. ETDs represent a vast repository of academic research, yet their complex structures and inconsistent metadata often hinder discoverability and integration into digital libraries. My research is driven by the imperative to bridge this gap, leveraging applied ML, NLP, and CV to transform ETDs into structured, accessible, and valuable components of scholarly big data.

Dissertation Defense

I developed ETDSuite, a toolkit designed to mine ETDs and their structured components. ETDSuite addresses critical challenges in digital libraries by providing machine learning-based methods for page-level segmentation, metadata extraction, citation parsing, and metadata enhancement. This toolkit has been instrumental in improving the accessibility and quality of ETD repositories, facilitating more efficient knowledge discovery and retrieval.

On November 6, 2024, I successfully defended my dissertation in front of a live audience and my committee:

I appreciate the input of all committee members toward making my dissertation better.

pic.twitter.com/1sL4AfDX5Z
— Jian Wu (@fanchyna) November 6, 2024

ETDSuite: A Toolkit to Mine Electronic Theses and Dissertation to Enrich Scholarly Big Data Using Natural Language Processing and Computer Vision -- PhD Defense from Muntabir Hasan Choudhury

My dissertation is also publicly available from ODU Digital Commons.

Choudhury, Muntabir H.. "ETDSuite: A Toolkit to Mine Electronic Theses and Dissertations to Enrich Scholarly Big Data Using Natural Language Processing and Computer Vision" (2024). Doctor of Philosophy (PhD), Dissertation, Computer Science, Old Dominion University, DOI: 10.25777/h6qt-1p64

When addressing the research problems of mining ETDs, I raised the following four research questions (RQs):

RQ1: Can we develop an AI method to extract metadata from the cover pages of scanned and born-digital ETDs?
RQ2: Library-provided metadata often exhibits incomplete, inconsistent, and incorrect values. How can we leverage AI methods to improve metadata quality?
RQ3: Will latent features that encode text and vision modalities outperform latent features obtained from a single modality in the ETD page classification?
RQ4: Is it possible to design a universal parser that accurately parses metadata from multi-style and multi-type citations as appeared in ETDs?

I developed the following frameworks by addressing the four RQs:

AutoMeta -- This is a framework to extract metadata fields from ETDs by leveraging NLP techniques. It used ML-based methods such as Conditional Random Field (CRF), incorporating text and visual features. The model was trained and evaluated using AutoMeta-ETD500 and achieved an F1 score of 83% -- 96%. More details can be found in my two published works: a) heuristic rules to extract metadata and b) automatic metadata extraction.

Metadata Extraction Pipeline Using Heuristic Rules

Automatic Metadata Extraction System

MetaEnhance -- This is a framework to improve the metadata quality of ETDs by filling out the missing values, correcting the incorrect values and misspellings, and canonicalizing the surface values by leveraging the SOTA ML and DL models. The framework was evaluated against MetaEnhance-ETDQual500 and achieved nearly perfect F1-scores in detecting errors and F1-scores ranging from 85% -- 100% for correcting five of seven key metadata fields. More details can be found in my published work -- metadata quality improvement.

MetaEnhance Framework to Improve Metadata Quality

The main contribution of this work:
1. MetaEnhaces uses AI methods
2. New evaluation benchmark
3. Remarkable Performance to improve metadata quality pic.twitter.com/kyuys3Bnid
— Yasasi (@Yasasi_Abey) June 27, 2023

ETDPC -- This is a two-stream novel multi-modal classification model with cross-attention that uses a vision encoder (ResNet50v2) and text encoder (BERT with Talking-Heads Attention) to classify ETD pages into 13 categories. The model was trained and evaluated using ETDPC-ETD500 and achieved an F1 score of 84% -- 96%. More details can be found in my published work -- classifying ETD pages.

Multimodal Framework to Classify ETD Pages

Professor @VT_CS Edward Fox (@edwardafox), core faculty @SanghaniCtrVT, is among coauthors of “ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations” presented by Jian Wu, assistant professor @WebSciDL, at IAAI 2024 co-located w/#AAAI2024. https://t.co/xYH9Pujz6T
— The Sanghani Center at Virginia Tech (@SanghaniCtrVT) February 28, 2024

Excited to announce that our paper titled "ETDPC: A Multimodality Framework for Classifying Pages in ETDs" has been accepted to IAAI-24 in the Emerging Applications of AI track. Many thanks to all the co-authors (@fanchyna @liya_lamia @edwardafox @sudobear).
cc/ @WebSciDL @oducs
— Muntabir Hasan Choudhury (@TasinChoudhury) October 23, 2023

To overcome the challenge of segmenting ETDs or classifying ETDs at the page level, we propose ETDPC, a two stream multimodal model (BERT with Talking Heads Attention and ResNet-50v2) with a cross-attention network to classify ETD pages into 13 categories. pic.twitter.com/6PoYz1pOJ7
— Muntabir Hasan Choudhury (@TasinChoudhury) October 23, 2023

Further, we demonstrated our system's robustness, proposed augmentation method for generating training samples for minority ETD pages, and contributed ETD500 dataset with 92,371-page annotations of ETDs, including PNGs, text, and bounding boxes.
— Muntabir Hasan Choudhury (@TasinChoudhury) October 23, 2023

LMParsCit -- This is a large language model-based framework (e.g., llama3-8b-instruct, GPT-3.5 turbo, and GPT-4o-mini) to extract key metadata fields—title, author, venue, and year—from references across a range of bibliography types (e.g., journals, conference proceedings, technical reports). It also supports multiple bibliography styles (e.g., IEEE, ACM, APA) and achieved an F1 score of 99% on CORA-ref and ETDCite.

Publications

My research contributions have been published in several peer-reviewed journals and conferences, where I have served as both first author and co-author.

IAAI 2024: ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations.
JCDL 2023: MetaEnhance: Metadata Quality Improvement for ETDs of University Libraries. (Best Short Paper Award)
ICDAR 2023: A Study on Reproducibility and Replicability of Table Structure Recognition Methods.
Sci-K 2022: A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software.
SDU$@AAAI 2022: Segmenting Technical Drawing Figures in US Patents.
JCDL 2021: Automatic Metadata Extraction Incorporating Visual Features from Scanned ETDs.
JCDL 2020: A Heuristic Baseline Method for Metadata Extraction from Scanned ETDs. (Best Poster Honorable Mention)
IJDL: Building Datasets to Support Information Extraction and Structure Parsing from ETDs.

Internship Opportunities

My early work in extracting metadata from scanned ETDs, where I applied NLP and CV, especially applying OCR technology, led me to land a first internship in the Summer of 2020 at Los Alamos National Laboratory (LANL) in New Mexico. During my internship at LANL, I developed a framework for Offline Handwritten Mathematical Equation Recognition, where the core architecture relied on a Convolutional Neural Network (CNN) called LeNET5-CNN. Working as a Research Intern at LANL not only expanded my skills in Computer Vision but also opened another door for me to obtain an internship opportunity in the following year (Summer 2021) with Bhirle Applied Research Inc. (BAR), an aerospace and aerodynamics company in Hampton, Virginia. During my internship at BAR, I developed and enhanced algorithms for the Train Detection Model used by Rail-Inspector. This cloud-based software processes aerial imagery of railroad tracks using Machine Learning and Deep Learning.

I would say, internships are a crucial stepping stone in building a successful career. However, research experience has significantly played an important role during my PhD career. My AI-driven research had been deeply application-focused, allowing me to develop cutting-edge applications (e.g., TechDrawFinder (a vector search engine for finding segmented patent figures) or ETDPC (a multimodal AI framework to segment ETDs)) that industries value. During my Ph.D., I always thrived to adopt new technology and proposed frameworks that could solve a complex problem with state-of-the-art (SOTA) results. For example, during the 5th year of my Ph.D., a sudden shift happened in the NLP domain. People have been using Large Language Models (LLMs) due to their SOTA performance in both NLP and Natural Language Understanding (NLU). I exploited this area to apply in my research, which further allowed me to develop a language-based citation parser, called LMParsCit, where the core architecture relies on LLMs (e.g., Llama-3-8b-instruct). Having a good understanding of language models and vision models through my research work helped me land another internship opportunity in the Summer of 2024 at the U.S. Food & Drug Administration as an ORISE Fellow (i.e., Research Fellow), where I enhanced a machine learning-based algorithms for one of the regulatory projects (Analytics Driven Supplement Evaluation Model) in the division of Center for Drug Evaluation & Research.

Internship Blogs:

Summer 2020 Internship Report -- Los Alamos National Laboratory
Summer 2021 Internship Report -- Bhirle Applied Research Inc.
Summer 2024 Internship Report -- U.S. Food & Drug Administration

Academic and Professional Journey

I was the first in my family to earn a Ph.D. Before enrolling in the Ph.D. program at ODU, I received my Bachelor of Science in Computer Engineering from Elizabethtown College (Etown) in 2018. While studying at Etown, I worked with Elizabethtown College Institutional Advancement as a Database Assistant. Additionally, I served as a Teaching Assistant in the Department of Computer Science under the supervision of Dr. Joseph T. Wunderlich. At Etown, I was an executive member of the National Society of Black Engineers and a member of The National Society of Leadership and Success. I also received the Sigma Pi Sigma from the American Honor Society in Physics and was awarded the Dean's List Honor in 2016. After my graduation, I worked as an Application Performance Engineer at Resource9 Group, Inc. in New York.

After completing one year of professional experience, I enrolled in the Ph.D. program at ODU. During my time at ODU, I served as a Teaching Assistant, where I had the privilege of mentoring undergraduate and graduate students in courses such as Machine Learning (CS 722/822) and Web Programming (CS 418/518). My responsibilities included delivering lectures on applying AI frameworks and technologies, guiding students through complex concepts, and supervising research projects related to NLP and digital libraries. Moreover, I mentored high school students in developing an AI-based search engine, called TechDrawFinder, which utilized vector search techniques to retrieve segmented patent images.

Through dedication and perseverance, I have consistently improved myself, as reflected in the following achievements:

Dominion Scholar Award (2023) – Old Dominion University.
Best Short Paper Award (2023) – ACM/IEEE Joint Conference on Digital Libraries (JCDL).
Outstanding Teaching Assistant Award (2022) – Old Dominion University.
Best Poster Honorable Mention (2020) – ACM/IEEE Joint Conference on Digital Libraries (JCDL).
Dr. Hussain Abdel-Wahab Graduate Fellowship (2020) – Old Dominion University.
AML Summer Research Fellowship (2020) – Los Alamos National Laboratory.

Congratulations to @TasinChoudhury @fanchyna from @WebSciDL @oducs and @sudobear @edwardafox @virginia_tech for best post/demo honorable mention at #JCDL2020! pic.twitter.com/nU6mduPK0x
— Shawn M. Jones, PhD / @shawnmjones@hachyderm.io (@shawnmjones) August 5, 2020

Best Short Paper Award won by “MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries" #jcdl2023 🏆
Congratulations @TasinChoudhury, @liya_lamia, @HimarshaJ, @fanchyna, William A. Ingram, and @edwardafox 👏@WebSciDL @ODUSCI pic.twitter.com/Kla2by2sMf
— Yasasi (@Yasasi_Abey) June 29, 2023

In addition, I was honored to receive an invitation to serve as a reviewer for the following peer-reviewed conferences and journals:

PeerJ Computer Science – One Manuscript Review (2024).
Scientometrics – One Manuscript Review (2023).
ACM/IEEE Joint Conference on Digital Libraries 2023 – One Paper Review.
ACM/IEEE Joint Conference on Digital Libraries 2022 – One Paper Review.
ACM/IEEE Joint Conference on Digital Libraries 2020 – 10 Poster Abstracts Review.

Post Doctoral Journey

I accepted an offer from the U.S. Food & Drug Administration (FDA) to serve as a Research Fellow in the CDER division. In this role, I will act as a subject matter expert, enhancing algorithms for a key regulatory project aimed at assessing drug products. My work will focus on leveraging state-of-the-art AI techniques to optimize drug product quality analysis.

Wrap Up

To wrap up this blog post, I want to share a few lessons I learned during my rollercoaster journey at ODU.

Work Hard, but Work Smart: One of the key lessons I learned from my advisor was the importance of working efficiently. To streamline my workflow, I developed a variety of automated scripts to handle repetitive tasks. For example, when compiling a large-scale ETD dataset for students to develop an ETD Search Engine in Dr. Wu’s Web Programming class, I automated processes such as converting large batches of PDFs to images and extracting text from those images. These scripts saved me valuable time, allowing me to focus on more critical aspects of my research rather than redoing previously completed tasks.

Effective Communication with Peers: Strong communication skills are essential, particularly when serving as a first author or co-author on research projects. Constructive criticism from peers should be taken seriously rather than personally, as it plays a crucial role in refining and improving the quality of research. Embracing feedback with an open mind fosters collaboration and ultimately leads to the production of high-quality research work.

Breadth and Depth: A successful Ph.D. journey requires balancing both breadth and depth in research. While specializing in a particular domain is essential for making novel contributions, having a broad understanding of related fields enables interdisciplinary thinking and innovative problem-solving. Expanding knowledge across multiple areas can lead to new perspectives and research opportunities.

Perseverance and Push Yourself: Research is filled with challenges, setbacks, and moments of uncertainty. Perseverance is key to overcoming obstacles and making progress. It is essential to push beyond your comfort zone, take on difficult problems, and remain committed to long-term goals, even when faced with failures or unexpected hurdles.

Professional Networking is Key: Building a strong professional network can open doors to collaborations, job opportunities, and mentorship. Attending conferences, engaging with researchers in the field, and actively participating in academic discussions help establish meaningful connections that can be valuable throughout one's career.

Find an Internship: Internships provide hands-on industry experience, exposure to real-world applications of research, and opportunities to work with experts outside academia. They not only enhance technical skills but also improve professional growth, making the transition from academia to industry smoother and more impactful.

Muntabir Choudhury, Ph.D. (@TasinChoudhury)

Research Fellow, U.S. FDA / CDER

Email: muntabirc@gmail.com

LinkedIn: https://www.linkedin.com/in/muntabirchoudhury/

Search This Blog

Web Science and Digital Libraries Research Group