Posts

Showing posts with the label zombie resources

2014-07-14: "Refresh" For Zombies, Time Jumps

Image
We've blogged before about " zombies ", or archived pages that reach out to the live web for images, ads, movies, etc.  You can also describe it as the live web "leaking" into the archive, but we prefer the more colorful metaphor of a mixture of undead and living pages.  Most of the time Javascript is to blame (for example, see our TPDL 2013 paper " On the Change in Archivability of Websites Over Time "), but in this example the blame rests with the HTML < meta http-equiv="refresh" content="..."> tag, whose behavior in the archives I discovered quite by accident. First, the meta refresh tag is a nasty bit of business that allows HTML to specify the HTTP headers you should have received.  This is occasionally useful (like loading a file from local disk), but more often that not seems to create situations in which the HTML and the HTTP disagree about header values, leading to surprisingly complicated things like MIME ty

2012-10-10: Zombies in the Archives

Image
Image provided from  http://www.taxhelpattorney.com/ In our current research, the WS-DL group has observed leakage in archived sites. Leakage occurs when archived resources include current content. I enjoy referring to such occurrences as "zombie" resources (which is appropriate given the upcoming Halloween holiday). That is to say, these resources are expected to be archived ("dead") but still reach into the current Web. In the examples below, this reach into the live Web is caused by URIs contained in JavaScript not being rewritten to be relative to the Web archive; the page in the archive is not pulling from the past archived content but is "reaching out" (zombie-style) from the archive to the live Web.  We provide two examples with humorous juxtaposition of past and present content. Because of  JavaScript, rendering a page from the past will include advertisements from the present Web. 2008 memento of cnn.com f