2021-09-23: Real-time Header Extraction from Scientific PDF Documents: Summer Research Internship Experience at Los Alamos National Laboratory

This Summer, I was accepted as a Graduate Student in the Institutional Scientific Content (ISC) Team, a sub division of the  Research Library at Los Alamos National Laboratory (LANL) . LANL is a United States Department of Energy national laboratory, located in Los Alamos, New Mexico, in the southwestern United States. Its mission is to solve national security challenges through scientific excellence. This year, LANL continued its student internship program for Summer 2021. Approximately 1500 students joined LANL to work on various projects during this Summer. Due to social distancing restrictions, most internships were limited to remote work off laboratory property. My internship was a 12 weeks program which started on 7th of June, 2021. During this internship program, I worked remotely as a Research Intern, under the supervision of Brian Cain . Throughout this program, I attended meetings with my supervisor,  Brian Cain  and the development team, development sessions, and meetings

2021-09-20: Digging Up a Gem Through the Web Archives

As we commemorate the Internet Archive turning 25 years , I decided to unearth some memories from the most precious days of my life.  I attended Devi Balika Vidyalaya, Colombo, Sri Lanka for my high school education (2004-2012). In 2004, I joined the Junior Western Band of our school which paved the way for me to join “DBVSBB”, Devi Balika Vidyalaya Senior Brass Band in the following year. Being a senior brass band member at my school for seven years (Figure 01), I have attended many concerts, received many certificates, and won numerous competitions. Fast forward to 2021, being a Ph.D. student working in the realm of web archiving, I was keen to look for any online presence of our band’s achievements at the time through web archives. Figure 01: A few pictures taken at the band practices and concerts over the years. As step one in discovering mementos from Internet Archive’s Wayback Machine, I was trying to recall a time where our band got featured on a newspaper or website. Unfortun

2021-09-19: Conditional Random Field with Textual and Visual Features to Extract Metadata From Scanned ETDs

Our previous  blog  described Electronic Theses and Dissertations (ETDs) before 1997, and a significant fraction of ETDs after 1997 are scanned from physical copies. These ETDs are valuable for digital library preservation, but to make them accessible, it is necessary to index these ETDs. Many ETD repositories are accompanied by incomplete, little, or no metadata, posing challenges for accessibility. For example, advisor names appearing on the Scanned ETDs may not be available in the metadata provided in the library repository. Thus, an automatic approach should be adopted to extract metadata from scanned ETDs. We proposed a conditional random field (CRF) based sequence tagging model that combines textual and visual features . The source code can be found in our GitHub repository. Introduction Automatic metadata extraction is important to build scalable digital library search engines. Most existing tools such as GROBID [1], CERMINE [2], and ParsCit [3] developed and applied to born-di

2021-09-16: Train Detection - 2021 Summer Internship at Bihrle Applied Research Inc

During the end of my 2nd year as a Ph.D. student at Old Dominion University (ODU) , I was fortunate to get a remote internship at Bihrle Applied Research Inc  (BAR) -- an aerospace and aerodynamics company near NASA Langley Research Center in Hampton, VA. This is my 2nd remote internship opportunity in the United States after having an excellent internship experience at Los Alamos National Laboratory in Summer 2020 . Although the internship was remote, I was called for three days of on-site training to familiarize myself with  the project and tasks  I had to accomplish during the summer.  I worked as a summer intern   on a Rail- Inspector project -- a cloud-based software that automatically processes aerial imagery of railroad  tracks using machine learning and deep learning to identify track components, make measurements,  and identify defects. During the internship, I worked with the technical development team of  BAR’s Ardenna business unit to develop and enhance the AI-based algo