2014-07-14: "Refresh" For Zombies, Time Jumps
We've blogged before about " zombies ", or archived pages that reach out to the live web for images, ads, movies, etc. You can also describe it as the live web "leaking" into the archive, but we prefer the more colorful metaphor of a mixture of undead and living pages. Most of the time Javascript is to blame (for example, see our TPDL 2013 paper " On the Change in Archivability of Websites Over Time "), but in this example the blame rests with the HTML < meta http-equiv="refresh" content="..."> tag, whose behavior in the archives I discovered quite by accident. First, the meta refresh tag is a nasty bit of business that allows HTML to specify the HTTP headers you should have received. This is occasionally useful (like loading a file from local disk), but more often that not seems to create situations in which the HTML and the HTTP disagree about header values, leading to surprisingly complicated things like MIME ty