Thursday, December 19, 2013

2013-12-19: 404 - Your interview has been depublished

Early November 2013 I gave an invited presentation at the EcoCom conference (picture left) and at the Spreeeforum, an informal gathering of researchers to facilitate knowledge exchange and foster collaborations. EcoCom was organized by Prof. Dr. Michael Herzog and his SPiRIT team and the Spreeforum was hosted by Prof. Dr. Jürgen Sieck who leads the INKA research group. Both events were supported by the Alcatel-Lucent Stiftung for Communications research. In my talks I gave a high-level overview of the state of the art in web archiving, outlined the benefits of the Memento protocol, pointed at issues and challenges web archives face today, and gave a demonstration of the Memento for Chrome extension.

Following the talk at the Spreeforum I was asked to give an interview for the German radio station Inforadio (you may think of it as Germany's NPR). The piece was aired on Monday, November 18th at 7.30am CET. As I had left Germany already I was not able to listen to it live but was happy to find the corresponding article online that basically contained the transcript of the aired report and an audio file was embedded in the document. I immediately bookmarked the page.

A couple of weeks later I revisited the article at its original URI only to find it was no longer available (screenshot left). Now, we all know that the web is dynamic and hence links break and even we have seen odd dynamics at other media companies before but in this case, as I was about to find out, it was higher powers that caused the detrimental effect. Inforadio is a public radio station and therefore, as many others in Germany and throughout Europe, to a large extent financed by the public (as of 2013 the broadcast receiving license is 17.98 Euros (almost USD 25) per month per household). As such they are subject to the "Rundfunkstaatsvertrag", which is a contract between the German states to regulate broadcasting rights. The 12th amendment to this contract from 2009 mandates that most online content must be removed after 7 days of publication. Huh? Yeah, I know, it sounds like a very bad joke but it is not. It even lead to coining the term "depublish" - a paradox by itself. I had considered public radio stations as "memory organizations", in league with libraries, museums, etc. How wrong was I and how ironic is this, given my talk's topic!? For what it's worth though, the content does not have to be deleted from the repository but it has to be taken offline.

I can only speculate about the reasons for this mandate but to me believable opinions circulate indicating  that private broadcasters and news publishers complained about unfair competition. In this sense, the claim was made that "eternal" availability of broadcasted content on the web is unfair competition as the private sector is not given the appropriate funds to match that competitive advantage. Another point that supposedly was made is that this online service goes beyond the mandate of public radio stations and hence would constitute a misguided use of public money. To me personally, none of this makes any sense. Broadcasters of all sorts have realized that content (text, audio, and video) is increasingly consumed online and hence are adjusting their offerings. How this can be seen as unfair competition is unclear to me.

But back to my interview. Clearly, one can argue (or not) whether the document is worth preserving but my point here is a different one:
Not only did I bookmark the page when I saw it, I also immediately tried to push it into as many web archives as I could. I tried the Internet Archive's new "save page now" service but, to add insult to injury, Inforadio also has a robots.txt file in place that prohibits the IA from crawling the page. To the best of my knowledge this is not part of the 12th amendment to the "Rundfunkstaatsvertrag" so the broadcaster could actually take action to preserve their online content. Other web sites of public radio and TV stations such as Deutschlandfunk or ZDF do not prohibit archives from crawling their pages.

Fortunately, the archiving service was able to grab the page (screenshot left) but the audio feed is lost.

Just one more thing (Peter Falk style):
Note that the original URI of the page:

when requested in a web browser redirects (200-style) to:

The good news here: it is not a soft 404 so the error is somewhat robot friendly. The bad news is that the original URI is thrown away. As the original URI is the only key for a search in web archives, we can not retrieve any archived copies (such as the one I created in without it. Unfortunately, this is not only true for manual searches but it also undermines automatic retrieval of archives copies by clients such as the browser extension Memento for Chrome. As stressed in our recent talk at CNI this is very bad practice and unnecessarily makes life harder for those interested in obtaining archived copies of web pages at large, not only my radio interview.


No comments:

Post a Comment