Posts

2011-08-28: Fall 2011 WS-DL Classes

Image
The Web Science and Digital Libraries Research Group is offering two classes for the fall 2011 semester. CS 895 Web-Based Information Retrieval will be offered on Tuesdays, 4:20-7:00 in room 2120 of the ECS building. This class will use the recent Croft, Metzler & Strohman book as the required text, and the Manning, Ragahavan, & Schutze book as the recommended text. By choosing the former book as the primary guide for the course, we are intentionally provided a strong engineering component to the class (i.e., a level of coding and development is expected) as opposed to just a theoretical exploration of information retrieval. CS 751/851 Introduction to Digital Libraries is not a prerequisite, but it would help to be familiar with the material covered in that class. Dr. Weigle will be teaching CS 795/895 Information Visualization on Thursdays, 9:30-12:15 in room 2120 of the ECS building. This class is a follow-on to the CS 796/896 Visual Analytics Seminar from last

2011-07-26: Universal Access to All Knowledge

Image
On July 26, 2011, the Web Science and Digital Library group at Old Dominion University hosted Kris Carpenter Negulescu, Director of the Web Group at the Internet Archive who gave a talk entitled “Universal Access to All Knowledge”. The presentation started with an introduction about what the Internet Archive is, then, she gave us some information about what are the archived materials in Internet Archive for now: Text (+2.9M books), Moving Images (+542,500 items), Audio (+950,000 items), Television broadcast (+1M hours), Web Pages (+150 billion pages). Moreover, she gave an overview about some of the special collections such as K-12 students and NASA images . After that, Kris explained the common collection strategies that are used by the Internet Archive to crawl the web. Frequently, they are doing a broad survey for the wide range domains such as .com, .net, .org, etc. They also considered the frequency of change for these websites and gave more support to the sites without

2011-07-28: Web Video Discussing Preservation Disappears After 24 Hours

Image
One week ago (July 21, 2011) I was fortunate enough to be invited to speak about Web Archiving on Canada AM , sort of like the Today Show or Good Morning America in the US. I was asked to appear on the program in part because of the July 17, 2011 article in the Washington Post, which followed a July 6, 2011 blog post for the Chronicle of Higher Education, which was based on a June 23, 2011 blog post about our JCDL 2011 paper " How Much of the Web is Archived? ". In other words, the process went like this: step 1 - get lucky & step 2 - let preferential attachment do its thing. I was able to do the appearance in Washington DC, while attending the NDSA/NDIIPP 2011 Partner Meetup . The morning of July 21, I took a taxi to an ABC studio in DC, did the interview (about 4 minutes) and took a taxi back to the conference in time to make the morning session. I had not been on TV before and was both nervous and excited. The local and Canadian crew made the entire exp

2011-07-25: NDSA/NDIIPP Partner Meetup 2011 Trip Report

Image
The NDSA/NDIIPP ( @ndiipp ) Partner Meetup took place July 19-21 at the Hyatt Regency Washington on Capitol Hill in Washington, DC. Technical and non-technical joined together to form an aggregated consortium of archivists, librarians, digital media specialists and concerned parties. Three representatives from the ODU Web Sciences and Digital Libraries group attended to make archivists aware of tools they had developed to accomplish the common goal of web archiving. WS-DL’s Comtributions to the NDSA/NDIPP Meetup Mat Kelly presented the Mozilla Firefox add-on Archive Facebook to a breakout group of presentations specifically targeting web archiving. The redesigned and re-architected add-on allows a user to archive the content of his/her Facebook account with the result being truly WYSIWYG versus Facebook’s native offerings of a content dump.   NDIIPP/NDSA 2011 - Archive Facebook from Mat Kelly Vivens Ndatinya showed the workings of a tool he is currently buildin

2011-07-21: Towards a Machine-Actionable Scholarly Communication System

I've told all the members of my research group they should watch this, so I thought I might as well make the same recommendation to the rest of the world... Herbert Van de Sompel presented "Towards a Machine-Actionable Scholarly Communication System" at LIBER 2011 in Barcelona, Spain on June 30, 2011. You really have to simultaneously watch the video and review the slides to get the full impact of the presentation. The first part is a succinct review of various projects, but starting at slide 16 ("nanopublications") things really get interesting. Well worth the 40 minute investment. Towards a Machine-Actionable Scholarly Communication System View more presentations from Herbert Van de Sompel --Michael

2011-07-05: JCDL 2011 Trip Report

Image
JCDL 2011 ( #jcdl2011 ) was held June 13–16 in Ottawa, Ontario, Canada. The weather was beautiful and the conference sessions wonderful. The ODU Web Sciences and Digital Libraries team was fortunate enough to have six of its members attend, present three short papers, and demonstrate the Synchronicity Firefox extension. Our Contributions to JCDL 2011 Ahmed Alsum presented How Much of the Web is Archived? This paper approximates the amount of the Web that is archived using four URI sources. From this data, we observe significant variation in archival rate in URIs from different sources. So, how much of the web is archived? It depends on which web you mean. ( pdf , slides ). How Much of the Web is Archived? JCDL 2011 from Ahmed AlSum Martin Klein presented Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures , which details a method for discovering missing web pages (the dreaded 404 ). Martin also demonstrated Synchronicity , a Firefox

2011-06-23: How Much of the Web is Archived?

Image
There are many questions to ask about web archiving and digital preservation - why is archiving important? what should be archived? what is currently being archived? how often should pages be archived? The short paper "How Much of the Web is Archived?" (Scott G. Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, and Michael L. Nelson), published at JCDL 2011, is our first step at determining to what extent the web is being archived and by which archives. To address this question, we sampled URIs from four sources to estimate the percentage of archived URIs and the number and frequency of archived versions. We chose 1000 URIs from each of the following sources: Open Directory Project (DMOZ) - sampled from all URIs (July 2000 - Oct 2010) Delicious - random URIs from the Recent Bookmarks list Bitly - random hash values generated and dereferenced search engine caches ( Google , Bing , Yahoo! ) - random sample of URIs from queries of 5-grams (using Google&#

2011-06-29: OAC Demo of SVG and Constrained Targets

Image
Online annotating service is a tool that helps to annotate different resources with different authors and give this annotation a separate URI that can be shared using a Facebook post, blog post, tweet, etc. Web annotations can be described as a relation between different resources with different media types like text, image, audio, or video. The web annotation service will be able to provide: A unique URI for every annotation. Persistent annotations. Annotate specific part of media. Keep track of the resources. Present annotation in browser. Meet the OAC model requirements ( alpha3 release ) . Open Annotation Model: This service will generate annotations that meet the OAC model specification. In an annotation that contains different resources, the OAC will introduce a new resource that describes the relationships between the resources that make the annotation. Example: A user who is interested in wildlife is browsing a page about elephants in Africa, and he was interested in the m