Posts

Showing posts with the label JCDL 2011

2011-07-05: JCDL 2011 Trip Report

Image
JCDL 2011 ( #jcdl2011 ) was held June 13–16 in Ottawa, Ontario, Canada. The weather was beautiful and the conference sessions wonderful. The ODU Web Sciences and Digital Libraries team was fortunate enough to have six of its members attend, present three short papers, and demonstrate the Synchronicity Firefox extension. Our Contributions to JCDL 2011 Ahmed Alsum presented How Much of the Web is Archived? This paper approximates the amount of the Web that is archived using four URI sources. From this data, we observe significant variation in archival rate in URIs from different sources. So, how much of the web is archived? It depends on which web you mean. ( pdf , slides ). How Much of the Web is Archived? JCDL 2011 from Ahmed AlSum Martin Klein presented Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures , which details a method for discovering missing web pages (the dreaded 404 ). Martin also demonstrated Synchronicity , a Firefox ...

2011-06-23: How Much of the Web is Archived?

Image
There are many questions to ask about web archiving and digital preservation - why is archiving important? what should be archived? what is currently being archived? how often should pages be archived? The short paper "How Much of the Web is Archived?" (Scott G. Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, and Michael L. Nelson), published at JCDL 2011, is our first step at determining to what extent the web is being archived and by which archives. To address this question, we sampled URIs from four sources to estimate the percentage of archived URIs and the number and frequency of archived versions. We chose 1000 URIs from each of the following sources: Open Directory Project (DMOZ) - sampled from all URIs (July 2000 - Oct 2010) Delicious - random URIs from the Recent Bookmarks list Bitly - random hash values generated and dereferenced search engine caches ( Google , Bing , Yahoo! ) - random sample of URIs from queries of 5-grams (using Google...