2023-08-21: Animating Changes in Webpages: Virginia Department of Health
In July 2023, Virginia Governor Glenn Youngkin received national press for deleting webpages on the Virginia Department of Health website containing LGBTQ+ youth resources. This isn't the first time the Youngkin administration has deleted webpages: within one week of his inauguration, he deleted the Virginia Mathematics Pathways Initiative website. The Washington Post article from July 2023 also discussed additional webpages that had content removed in 2022. Journalists discovered these webpage changes through emails between state employees, but the emails didn't contain the website addresses. Without linking to mementos at a web archive, readers of the articles can't see the changes for themselves. Not linking to archived webpages in news articles is consistent with what we discovered in 2022: 20% of journalists who used web archives as evidence didn't link to the past web or the live web in the news story.
Web archives don't currently support searching for terms and phrases that have been deleted, so we had to meticulously browse the Virginia Department of Health webpage in the Internet Archive's Wayback Machine in order to find the three pages that underwent changes in 2022. Once we had the addresses, we could visualize the changes using the Internet Archive's Changes tool, and we could also view the changes as an animation. We also used MemGator to retrieve a list of all mementos for each page, so that we could manually narrow down the dates of when the terms were added and removed.
Viewing Changes on Virginia Department of Health webpages
The first change shows the main Family Planning webpage. The definition for reproductive justice was removed between April and May, 2022. The definition was added between March and June, 2021.
Figure 2: Animation of "reproductive justice" removed from https://www.vdh.virginia.gov/family-planning between 2022-04-01 and 2022-05-22
The second change shows the webpage about Virginia state funding of abortions. Links to organizations that fund abortions were removed between April and May, 2022. The links were added between March and May, 2020. For this case, the Changes Tool version shows more context than the animated difference tool.
Figure 3: Animation of "provide" removed from https://www.vdh.virginia.gov/pregnancy/state-funding-of-certain-abortions/ between 2022-04-04 and 2022-05-03. Additional changes without the term "provide" are not animated.The third change shows the adolescent sexual health FAQ webpage. Links to organizations with additional information about teen sexual health were removed between February and June, 2022. These links were present on the first version of this page in any public web archive, dated April, 2020.
Figure 4: Animation of "assistance" removed from https://www.vdh.virginia.gov/adolescent-health/sexual-health-faqs between 2022-02-03 and 2022-06-24
All of these deletions remove information added during the previous governor's administration.
Challenges with Animating Webpages
The animation tool uses EDGI's HTML difference library, which uses BeautifulSoup to parse webpages. BeautifulSoup expects a certain amount of the HTML on the page to be valid. Although W3C provides a free HTML validation tool for website creators to use, the reality is that most webpages aren't 100% valid HTML. In this case, the Virginia Department of Health webpages all contained invalid HTML comments.
Figure 5: The source code from the Virginia Department of Health with invalid HTML comments, parsed properly by Chrome.While web browsers such as Chrome are able to parse these invalid comments, BeautifulSoup wasn't able to parse them, which affected the animation. We added some code to the EDGI HTML difference library to convert these invalid comments to valid comments before the HTML was parsed by BeautifulSoup so that the animation would display properly. It's uncommon to find a webpage with HTML errors strong enough to cause issues with BeautifulSoup, but it's something we're going to have to continue to look for in conjunction with animating webpages.
Outlook
Both the Wayback Machine changes tool and the animated differences tool provide users with ways to view how webpages have changed over time, but users can't use these tools without knowing the original addresses of the pages. Being able to search for changes in web archives would save users time and make changes more easily discoverable.
-Lesley
Comments
Post a Comment