I was recently working on a talk to present to the Southeast Women in Computing Conference about telling stories with web archives (slideshare). In addition to our Hurricane Katrina story, I wanted to include my academic story, as told through the archive.
I was a grad student at UNC from 1996-2003, and I found that my personal webpage there had been very well preserved. It's been captured 162 times between June 1997 and October 2013 (https://web.archive.org/web/*/http://www.cs.unc.edu/~clark/), so I was able to come up with several great snapshots of my time in grad school.
Aside: My UNC page was archived 20 times in 2013, but the archived pages don't have the standard Wayback Machine banner, nor are their outgoing links re-written to point to the archive. For example, see https://web.archive.org/web/20130203101303/http://www.cs.unc.edu/~clark/Before I joined ODU, I was an Assistant Professor at Clemson University (2004-2006). The Wayback Machine shows that my Clemson home page was only crawled 2 times, both in 2011 (https://web.archive.org/web/*/www.cs.clemson.edu/~mweigle/). Unfortunately, I no longer worked at Clemson in 2011, so those both return 404s:
Sadly, there is no record of my Clemson home page. But, I can use the archive to prove that I worked there. The CS department's faculty page was captured in April 2006 and lists my name.
Wouldn't it be cool if when I request a page that 404s, like http://www.cs.clemson.edu/~mweigle/, the archive could figure out that there is a similar page (http://www.cs.unc.edu/~clark/) that links to the requested page?
The only memento from 2014 is on Aug 9, 2014, but it returns a 302 redirecting to an earlier memento from 2013.
It appears that Heritrix crawled http://www.cs.odu.edu/~mweigle (note the lack of a trailing /), which resulted in a 302, but http://www.cs.odu.edu/~mweigle/ was never crawled. The Wayback Machine's canonicalization is likely the reason that the redirect points to the most recent capture of http://www.cs.odu.edu/~mweigle/. (That is, the Wayback Machine knows that http://www.cs.odu.edu/~mweigle and http://www.cs.odu.edu/~mweigle/ are really the same page.)
My home page is managed by wiki software and the web server does some URL re-writing. Another way to get to my home page is through http://www.cs.odu.edu/~mweigle/Main/Home/, which has been saved 3 times between 2008 and 2010. (I switched to the wiki software sometime in May 2008.) See https://web.archive.org/web/*/http://www.cs.odu.edu/~mweigle/Main/Home/
Since these two pages point to the same thing, should these two timemaps be merged? What happens if at some point in the future I decide to stop using this particular wiki software and end up with http://www.cs.odu.edu/~mweigle/ and http://www.cs.odu.edu/~mweigle/Main/Home/ being two totally separate pages?
Finally, although my main ODU webpage itself is fairly well-archived, several of the links are not. For example, http://www.cs.odu.edu/~mweigle/Resources/WorkingWithMe is not archived.
Also, several of the links that are archived have not been recently captured. For instance, the page with my list of students was last archived in 2010 (https://web.archive.org/web/20100621205039/http://www.cs.odu.edu/~mweigle/Main/Students), but none of these students are still at ODU.
Now, I'm off to submit my pages to the Internet Archive's "Save Page Now" service!