2026-05-26: URL Arguments in API Calls Can Cause Intermittent Temporal Violations While Replaying Archived Web Pages
URL Arguments in API Calls Can Cause
Intermittent Temporal Violations
While Replaying Archived Web Pages
Michael L. Nelson
2026-05-26
Just over two months ago, I was at the Information Stewardship Forum 2026 at the Internet Archive, where I was fortunate enough to present a lightning talk about making copies of copies, entitled "The Disintegration Loops: Generational Loss in Web Archives". During one of the breaks, Mark Graham asked Sawood Alam to take a look at a problem that had stumped the Wayback Machine support team. I was sitting next to Sawood, and knowing my love for web archiving investigations, Mark invited me to take a look too. The original inquiry:
Hi, everyone! Got a concerning report from a patron alleging that WBM "URLs were intermittently displaying the current version of the website instead of the archived version." The URLs in question are:
https://web.archive.org/web/20240222221058/https://www.victoriassecret.com/us/site-terms-and-notices
https://web.archive.org/web/20241228224626/https://www.victoriassecret.com/us/site-terms-and-notices
A quick check shows that when replaying these URLs, the content does resemble what is on the live web. For example, the text shown on the page references 2025 and 2026 updates, even though the captures are from 2024 - 2025. I've attached a screenshot of the 2025 capture appearing to show live web content as well as a printout/capture the patron provided of the same URL appearing to show the "actual" archive.
Sawood and I discovered that the problem is not that these URLs are sometimes displaying the live web (or at least not directly). The problem is that this seemingly simple "Terms of Use" page is unnecessarily complex, with the boilerplate legal text included via an API call. The JavaScript that makes the call includes a number of superfluous URL arguments, including "screenWidth" and "screenHeight", and probably are appended to all API calls "just in case they are needed" (presumably the "Terms of Use" do not actually vary based on the size of the browser). Thus, depending on the size of your browser, the legal text included in the page is potentially archived at different times, sometimes resulting in a temporal violation: a replay of an archived web page with subresources in a combination that did not exist at the time the top level page was archived.
Although there are potentially a countably infinite number of archived "Terms of Use" pages, for the examples above there are two semantically interesting versions: one is marked (near the top, left-hand side) "Last Updated: January 18, 2024" and the other is marked "Last Updated: September 22, 2025". Taking these "Last Updated" strings at face value, we would not expect the three URLs above (archived at "20240222221058" (February 22, 2024), "20241228224626" (December 28, 2024), and "20250531013827" (May 31, 2025)) to display "Last Updated: September 22, 2025". But sometimes they do – and sometimes they don't – and which archived version you get depends on the size of your browser.
First, as of the time of this writing, the live web still has the "Last Updated: September 22, 2025" version:
https://www.victoriassecret.com/us/site-terms-and-notices
What appears to be a relatively simple HTML page is unnecessarily complex, with nearly 200 subresources. The figure below shows the relevant portion of the call stack: the HTML page calls the cheekily named JavaScript "brastrap.js", which in turn calls the API at "api.victoriassecret.com".
https://api.victoriassecret.com/categories/v15/page?...
For me, right now, the full live web URL is (emphasis added):
https://api.victoriassecret.com/categories/v15/page?categoryId=4b1ed4b3-5965-4a4d-a3d5-1e5ad379445a&brand=vs&isPersonalized=true&activeCountry=US&platform=mobile&deviceType=phone&platformType=ios&perzConsent=true&cid=&tntId=&screenWidth=701&screenHeight=605
Guessing at the URL arguments:
categoryId=4b1ed4b3-5965-4a4d-a3d5-1e5ad379445a
I guess this hash identifies the "Terms of Use" page?
brand=vs
"vs" = Victoria's Secret? I believe the parent company operates several affiliated brands, and perhaps the API serves all of them.
isPersonalized=true
should this be "false"? – I don't have an account here
activeCountry=US
I'm definitely coming from a US IP address
platform=mobile, deviceType=phone, platformType=ios
none of these are accurate; I'm on a Mac Air laptop.
perzConsent=true
looks like a GDPR-related argument
cid=, tntId=
tracking arguments (currently null)
screenWidth=701, screenHeight=605
these are the current dimensions of the active window in my Chrome browser
It's the last two arguments, "screenWidth" and "screenHeight", that cause the intermittent behavior the original users noticed.
First, let's consider the page archived on February 22, 2024 ("20240222221058"), which clearly shows the "Last Updated: September 22, 2025" string:
https://web.archive.org/web/20240222221058/https://www.victoriassecret.com/us/site-terms-and-notices
And since the live web still has "Last Updated: September 22, 2025", this is what caused people to think they were getting a live web version (more on that in a bit). First of all, the Wayback Machine's "About this capture" link does not help; it shows only some of the subresources (improving its function is a task for another time):
"About this capture" lists only some of the subresources, and not the problematic api.victoriassecret.com page.
Sawood discovered the API URL first. It's well-obfuscated, so it's not a surprise that tech support staff did not find it immediately. We were sitting side by side, each using our own laptops, and he's much smarter than me and he's always going to win that race. But I noticed that for me, the page seemed to be saved right then, just a minute or two before, whereas he saw that it was archived a few days before (it was then March 19, 2026). That was odd, but the next session started and I had to stop.
The 2024 archived version of the page uses a "/v12/" version of the API endpoint (note: this is a common but wrong way to version an API), but it's similar to the 2026 live web example above:
In particular, the "/v12/" endpoint remains functional, even though the live web HTML & brastrap.js access the "/v15/" version. Checking the Wayback Machine directly confirmed that this was indeed the first time that URL had been archived:
Although Sawood found the problem URL, and we confirmed it was archived in March, 2026 (and thus displayed the "Last Updated: September 22, 2025" string), it bothered me that he had an earlier archival time than I did (March 14, 2026 vs. March 19, 2026). After the next session ended, I returned to this problem. I changed the size of my browser, and was able to force another new archived version (reproduced on March 22, 2026 below):
The highlighted text shows:
Although it's beyond the scope of this post, the Wayback Machine's Save Page Now has a "/save/_embed/" API that allows the Wayback Machine to "patch" the archive with missing URLs from the live web. In this case, the version of the API response ending with "&screenWidth=565&screenHeight=605" was "missing" from the Wayback Machine, so it patched the archive from the live web, which still displays the "Last Updated: September 22, 2025" string, despite the main HTML page being archived in February, 2024. So in essence, the Wayback Machine was displaying the live web version, after it was immediately saved to Wayback Machine. Presumably the "Terms of Use" page changes slowly, but this behavior would be more noticeable if the "Last Updated" string was updated, say, every minute.
A call to the CDX API confirmed that there were a variety of screenWidth and screenHeight combinations archived (horizontally scroll to the right in the gist below to see the combinations):
In fact, by inspection, there are at least two chances to get the wrong version. If your screen size is "screenWidth=1600&screenHeight=1000", you will get a version of the page that has the string "Last Updated: February 7, 2023", a temporal violation reaching into the past instead of the previously described version that is a temporal violation from the future. A screen size of "screenWidth=1400&screenHeight=900" will produce the right result ("Last Updated: January 18, 2024"), and a screen size of "screenWidth=1440&screenHeight=900" will produce a different wrong result ("Last Updated: September 22, 2025"). And as shown above, a screenWidth and screenHeight combination not already archived will cause the Wayback Machine to be patched from the live web. Furthermore, if/when the "/v12/" live web API endpoint is deprecated, then unarchived size combinations will just cause the page replay to silently fail, and most people won't understand why.
In summary, this seemingly simple "Terms of Use" page is really quite challenging in practice:
The API call is not easily discovered, and the "About this capture" service does not show the API URL (and many of the other nearly 200 URLs of subresources in this page).
The API has a raft of (arguably) unnecessary URL arguments that do not change the response and cause the Wayback Machine to patch the archive from the live web.
Because the temporally violative subresource is JSON and not, say, a JPEG, one can't simply right-click on the subresource and inspect when it was archived.
We've encountered synchronization problems with HTML and JSON before (e.g., "Right HTML, Wrong JSON" (JCDL 2023), "Challenges in replaying archived Twitter pages" (IJDL 2024)), but the implementation complexity found in news outlets and social media was to be expected: the advanced UI features that make these sites engaging (e.g., auto-updating, infinite scroll, embedded media, personalized content) are the same features that make archival replay difficult. Without the "Last Updated: …" string, the problem would have been much harder to notice and diagnose. The seemingly intermittent nature, where you'd get a temporally coherent replay only if your browser was the same size as the previously archived responses, made the investigation especially challenging.
Who pays attention to their browser's exact width and height? In this case, they were the keys to solving this puzzle.
–Michael
Comments
Post a Comment