2024-12-24: Summer 2024 Internship Report -- ORISE Fellow with U.S. Food & Drug Administration
During my senior year as a Ph.D. candidate at Old Dominion University (ODU), I accepted an internship opportunity with the Center for Drug Evaluation and Research (CDER) / U.S. Food and Drug Administration (FDA) in the Division of Quality Intelligence III under the Office of Pharmaceutical Quality (OPQ) in Silver Spring, Maryland. I joined the FDA as a summer fellow through the Oak Ridge Institute for Science and Education (ORISE). This is my third remote internship opportunity after two consecutive internships with Los Alamos National Laboratory and Bhirle Applied Research Inc. Although the internship was remote, I was called in for two days to provide a fingerprint, collect my FDA badge, and meet with my supervisor on-site. I was assigned to one of the regulatory projects to enhance a site selection model algorithm that leverages a machine learning technique for comprehensive quality surveillance. I was supervised by John Wan (Supervisory Operations Research Analyst) and Lisa Hughey (Data Scientist), who guided me to better understand the project and helped me expand my knowledge in pharmaceutical manufacturing and regulatory science.
🎉 Excited to start as an ORISE Fellow (@ExperienceORISE) with @US_FDA for summer research! 🌟 Enhancing quality surveillance for regulated products using data mining & NLP. Let's dive in! 💼 #QualitySurveillance #DataScience 📊
— Muntabir Hasan Choudhury (@TasinChoudhury) May 7, 2024
cc/ @WebSciDL @ORISEconnect
![]() |
Food and Drug Administration White Oak Campus, Silver Spring, MD |
FDA's CDER & OPQ's Mission
CDER plays a crucial role in public health by ensuring that safe and effective medications are available to enhance health across the U.S. Operating under the FDA, CDER regulates prescription and over-the-counter drugs, including biologics and generics. Their responsibilities go beyond conventional medicines, covering chemical products such as fluoride toothpaste, antiperspirants, shampoos, and sunscreens. Additionally, they aim to modernize the drug review process, improving efficiency and transparency to facilitate quicker access to safe and effective patient medications while maintaining the highest scientific standards.
What is the FDA's role in regulating drugs?
Within CDER, I worked with OPQ, whose mission is to provide quality medicines for the American public. OPQ integrates assessment, inspection, surveillance, policy, and research activities to strengthen pharmaceutical quality globally. OPQ is responsible for the following tasks:
- Establish consistent, patient-focused quality standards.
- Integrate drug application assessment with manufacturing facility evaluation for a unified quality review.
- Identify quality issues and collaborate with FDA offices on enforcement if needed.
- Balance quality risks with the risk of drug unavailability.
- Anticipate and prevent potential quality issues to avoid drug shortages.
Team Meetings & Remote Work
I was assigned to one of the teams at OPQ called ASE. Their task was researching how to best analyze and evaluate the applications (i.e., cover letters) submitted by pharmaceutical companies and ensure the quality and risk of drug supplements before they were distributed in the market. From the beginning of my internship, I was involved with three different teams to learn more about various ongoing projects to support the Analytics-Driven Supplement Evaluation (ASE) model. In addition, I had a bi-weekly meeting with my two supervisors to update them on my progress and talk about administrative things.
I was given access to one of the scientific computers so that I could work from home. The computer was powerful enough to train machine learning and deep learning-based models using two NVIDIA T1000 GPUs. I set up the machine and installed all the necessary software patches and Python packages to conduct my research on enhancing the current ASE model for human drug products.
Project Description & Objective
I was assigned to review the documentation of the ASE model and identify its limitations. To provide more context on this project, pharmaceutical companies must notify the FDA about changes to approved applications following all statutory and regulatory requirements. All post-approval chemistry, manufacturing, and controls (CMC) changes beyond variations provided for in approved new drug applications (NDA) and abbreviated new drug applications (ANDA) are categorized into three reporting categories depending on submission severity and/or urgency.
Although CMC changes associated with these supplements could be relatively minor, they need systematic evaluation using an analytics-driven approach that categorizes submissions that were tagged “lower severity”. There has been a large increase in this category since 2015. With the help of artificial intelligence and natural language processing (NLP) techniques, the ASE model predicts if a proposed change necessitates in-depth manual assessment. However, the earlier model generates a substantial fraction of false positives (the reviewer says 'no further assessment' vs. the model predicts an 'in-depth assessment needed') when assessing submissions of lower severity.
Data Preprocessing
I was provided with the documents for supplemental submissions for the past several years for model training and validation. The dataset was highly skewed towards Class II (moderate severity), making it inappropriate for directly training a machine learning model. Hence, I downsampled the majority class to address the data imbalance problem. Further, the NLP toolkit was applied to perform tokenization and remove the stop words (e.g., a, the, am, etc.) from the cover letters. In addition, a vocabulary was built using the tokens, which were later mapped to an index. These indexes are then used as input for the model.
Embedding
The earlier ASE model's embeddings did not consider words that were out of vocabulary words. I proposed using a neural network-based approach for word embeddings to address this limitation. The embedding array with a vocabulary size from the cover letter content 'word-to-index' mapping was updated with vectors from the embedding model. Further, the embedding matrix was converted to a PyTorch tensor, which was used to train the ML model.
Training & Fine-tuning Hyper-parameters
The dataset was split into training (70%) and testing (30%). The PyTorch framework was used to initialize the model. First, the hyperparameters were fine-tuned heuristically. Finally, the model was built by leveraging GPU with 80 epochs, a learning rate of 0.01, a drop-out rate of 0.4, and Adadelta as an optimizer with an average decay rate of 0.95. As for the loss function, CrossEntropy was used with a reduction mode configured to mean.
Results & Discussion
For the validation, we used a ground truth dataset of 399 records with assessment (i.e., Class II: 323 samples and Class I: 76 samples). Table 1 illustrates the result of classifying the cover letters using the proposed and enhanced version of the ASE model, achieving F1-score of 0.95 for classifying Class I and II. Since the ground truth data had no Class III samples, the model did not produce classification results for this category.
![]() |
Table 1: Classification Report |
Based on the assessment of the application product (e.g., inherent product risk based upon dosage form, route of administration, etc.) and keywords from the cover letter, the final prediction of the model was then fed through a binary classifier. Further, the classifier assessed the prediction of the deep-learning model ("no further assessment needed" and "review") based on a threshold value.
Accomplishments
The agency collects abstracts from FDA summer students annually. My research was selected for presentation at the 2024 FDA Student Scientific Research Day under the theme "Unleashing the Power of Data." On the presentation day, 108 students presented during the poster session, where each student was given two minutes to summarize the research work.
Acknowledgment
I am writing to express my sincere gratitude to my supervisors, John Wan, and Lisa Hughey, for their continuous support throughout this research project. I also thank the ORISE Research Participation Program at the U.S. FDA, which was made possible through an interagency agreement between the U.S. Department of Energy and the FDA. I truly appreciate this valuable opportunity and look forward to continuing my work with the U.S. FDA after graduating in Fall 2024.
-- Muntabir Choudhury (@TasinChoudhury)
Comments
Post a Comment