2024-10-02: MS Thesis: Surfacing Text Changes in Archived Webpages

Thesis defense, July 29, 2024. Picture courtesy of Dr. Michele Weigle.


My master’s thesis, “Surfacing Text Changes in Archived Webpages” explores how users can better find and view changes on webpages in web archives. The thesis contributes to the area of information seeking behavior in web archives, and addressed three research questions. 

1. How can we make changes in webpages discoverable and understandable?

We presented a change text search interface for web archives that allows users to find changes in webpages. This interface also includes an animated deletion tool and a sliding difference tool, which help users view the changes in context. This part of the thesis was informed by our formative investigation “User Tasks of Journalists.” We presented this work in our paper, “Making Changes in Webpages Discoverable: A Change-Text Search Interface for Web Archives” at JCDL 2023, and the paper earned the best student paper award.


2. How can we increase efficiency in web archive user navigation for viewing change over time?

We introduced a prototype banner for the Wayback Machine that increases user efficiency when viewing changes in webpages. This part of the thesis was informed by a formative investigation of a close study of a Wayback Machine access log.


3. How can aggregated webpage changes of a corpus be used computationally to provide compelling evidence for edit intentions?

We evaluated the change text search backend using the EDGI’s US federal environmental webpages dataset from 2016-2020. The first part of the evaluation showed how the change text search index can be used to compute commonly deleted terms across the corpus, and verified many of EDGI’s tracked terms. This part of the evaluation was part of the JCDL paper. We also aligned the EDGI dataset with the ORCAS dataset of user queries and associated clicks in the same time frame. This allowed us to analyze the relationship between query terms and deleted terms, and we showed that users were searching for deleted terms, and that the pages underwent the opposite of search engine optimization. We will present this work in our paper, “Retrogressive Document Manipulation of US Federal Environmental Websites,” at CIKM 2024.



My thesis committee included my advisor, Dr. Michele Weigle, my co-advisor, Dr. Michael Nelson, and Dr. Sampath Jayarathna. I am grateful to everyone who provided me support during this time, as a thesis is a challenging experience.


Adding a license to my thesis


Incorporating two papers into my thesis meant that I had to follow the copyright restrictions imposed by IEEE (JCDL 2023) and ACM (CIKM 2024). For the IEEE paper, I had to transfer my copyright of the publication. There are various copyright symbols throughout my thesis, for example on some of my figures, because of this. However, because of ACM’s transition to full open access, I was given agency to choose the Creative Commons license of my choice for the publication. I chose a non-commercial, share-alike license which allows derivative works, such as my thesis. This meant that in order to comply with the paper’s share-alike license condition, I must also add the same license to my derivative work, the thesis. This proved to be more complicated than just adding the Creative Commons license on the copyright page of the thesis, but eventually the College of Sciences approved the license as an appendix. All authors should have the ability to add the license of their choice to their thesis if they desire, so hopefully there is now a procedure in place for future theses.


My journey and next steps


I started taking graduate computer science courses as a non-degree student in Spring 2021 in order to become qualified to teach dual enrollment computer science. I would describe myself as a self taught programmer, and have little academic background in the subject, so completing the necessary graduate coursework to become qualified to teach at the community college level seemed like a reach to me. ODU was the right choice for me because they support students who work full time and also have provisions for students missing any background coursework. I finished the required 18 credits in Summer 2022, and taught dual enrollment computer science (discrete structures and computer organization) for two years. I am grateful to the Virginia Department of Education and my employer Fairfax County Public Schools for funding the majority of my tuition during this time. Starting in Fall 2025, ODU will offer a graduate certificate for computer science teachers who want to become qualified to teach dual enrollment.


By providence, I ended up in Dr. Weigle’s Web Science course during my first semester. I was not planning on pursuing any additional coursework beyond the required 18 credits for my job, as I truly felt my vocation was teaching. However, Dr. Weigle saw me as a researcher, and I am grateful to ODU for offering a thesis based master’s program so that I could see if research was for me. Ultimately, I would now like to transition to a job where I have time for both research and teaching. I am currently taking a teaching sabbatical in the 2024-25 school year to continue at ODU as a PhD student.


-Lesley Frew

Comments