The article "Why Websites Are Lost (and How They're Sometimes Found)" has finally been published in the November 2009 issue of Communications of the ACM. Co-written with Frank McCown and Cathy Marshall, it was accepted for publication in the fall of 2007. Although we've had a pre-print available since 2008, it just isn't the same until you see it in print.
Except we won't be seeing this in print; it is instead published in the "Virtual Extension" part of the CACM. So even though it has page numbers (pp. 141-145), this article won't be among those that arrive in your mailbox in a few weeks. As someone who has spent his entire career trying to transform the scholarly communication process with the web and digital libraries I completely understand this move by the CACM, but I have to admit I'm disappointed that I won't see a printed, bound copy. Even though in the long-term, all discovery will come from the web (e.g., Google Scholar or personal publication lists), the short-term thrill of receiving the hard-copy in the mail is hard to to replace.
The article itself is a very nice summary of the problem area. The idea to write the paper came from our involvement in Warrick, a tool for reconstructing lost web sites. Warrick was very successful, and the interest in Warrick was so high we eventually became distracted from the mechanics of reconstruction and our focus turned to the question "why are people losing all these sites?!" We learned quite a bit.
Interested readers might also like: our paper in Archiving 2007, Frank's dissertation, or any of the several papers by Cathy on personal (digital) archiving.