Posts

2011-08-28: KDD 2011 Trip Report

Image
Author:  Carlton Northern The SIGKDD 2011 conference took place August 21 - 24 at the Hyatt Manchester in San Diego, CA.  Researchers from all over the world interested in knowledge discovery and data mining were in attendance.  This conference in particular has a heavy statistical analysis flavor and many presentations were math intensive. I was invited to present my masters project research at the Mining Data Semantics (MDS2011) Workshop of KDD.  In this paper, we present an approach to find social media profiles of people from an organization.  This is possible due to the links created between members an organization. For instance, co-workers or students will likely friend each other creating hyperlinks between their respective accounts.  These links, if public, can be mined and used to disambiguate other profiles that may share the same names as those individuals we are searching for.  The following figure shows the amount of pro...

2011-08-28: Fall 2011 WS-DL Classes

Image
The Web Science and Digital Libraries Research Group is offering two classes for the fall 2011 semester. CS 895 Web-Based Information Retrieval will be offered on Tuesdays, 4:20-7:00 in room 2120 of the ECS building. This class will use the recent Croft, Metzler & Strohman book as the required text, and the Manning, Ragahavan, & Schutze book as the recommended text. By choosing the former book as the primary guide for the course, we are intentionally provided a strong engineering component to the class (i.e., a level of coding and development is expected) as opposed to just a theoretical exploration of information retrieval. CS 751/851 Introduction to Digital Libraries is not a prerequisite, but it would help to be familiar with the material covered in that class. Dr. Weigle will be teaching CS 795/895 Information Visualization on Thursdays, 9:30-12:15 in room 2120 of the ECS building. This class is a follow-on to the CS 796/896 Visual Analytics Seminar from last ...

2011-07-26: Universal Access to All Knowledge

Image
On July 26, 2011, the Web Science and Digital Library group at Old Dominion University hosted Kris Carpenter Negulescu, Director of the Web Group at the Internet Archive who gave a talk entitled “Universal Access to All Knowledge”. The presentation started with an introduction about what the Internet Archive is, then, she gave us some information about what are the archived materials in Internet Archive for now: Text (+2.9M books), Moving Images (+542,500 items), Audio (+950,000 items), Television broadcast (+1M hours), Web Pages (+150 billion pages). Moreover, she gave an overview about some of the special collections such as K-12 students and NASA images . After that, Kris explained the common collection strategies that are used by the Internet Archive to crawl the web. Frequently, they are doing a broad survey for the wide range domains such as .com, .net, .org, etc. They also considered the frequency of change for these websites and gave more support to the sites without ar...

2011-07-28: Web Video Discussing Preservation Disappears After 24 Hours

Image
One week ago (July 21, 2011) I was fortunate enough to be invited to speak about Web Archiving on Canada AM , sort of like the Today Show or Good Morning America in the US. I was asked to appear on the program in part because of the July 17, 2011 article in the Washington Post, which followed a July 6, 2011 blog post for the Chronicle of Higher Education, which was based on a June 23, 2011 blog post about our JCDL 2011 paper " How Much of the Web is Archived? ". In other words, the process went like this: step 1 - get lucky & step 2 - let preferential attachment do its thing. I was able to do the appearance in Washington DC, while attending the NDSA/NDIIPP 2011 Partner Meetup . The morning of July 21, I took a taxi to an ABC studio in DC, did the interview (about 4 minutes) and took a taxi back to the conference in time to make the morning session. I had not been on TV before and was both nervous and excited. The local and Canadian crew made the entire exp...

2011-07-25: NDSA/NDIIPP Partner Meetup 2011 Trip Report

Image
The NDSA/NDIIPP ( @ndiipp ) Partner Meetup took place July 19-21 at the Hyatt Regency Washington on Capitol Hill in Washington, DC. Technical and non-technical joined together to form an aggregated consortium of archivists, librarians, digital media specialists and concerned parties. Three representatives from the ODU Web Sciences and Digital Libraries group attended to make archivists aware of tools they had developed to accomplish the common goal of web archiving. WS-DL’s Comtributions to the NDSA/NDIPP Meetup Mat Kelly presented the Mozilla Firefox add-on Archive Facebook to a breakout group of presentations specifically targeting web archiving. The redesigned and re-architected add-on allows a user to archive the content of his/her Facebook account with the result being truly WYSIWYG versus Facebook’s native offerings of a content dump.   NDIIPP/NDSA 2011 - Archive Facebook from Mat Kelly Vivens Ndatinya showed the workings of a tool he is currently bui...

2011-07-21: Towards a Machine-Actionable Scholarly Communication System

I've told all the members of my research group they should watch this, so I thought I might as well make the same recommendation to the rest of the world... Herbert Van de Sompel presented "Towards a Machine-Actionable Scholarly Communication System" at LIBER 2011 in Barcelona, Spain on June 30, 2011. You really have to simultaneously watch the video and review the slides to get the full impact of the presentation. The first part is a succinct review of various projects, but starting at slide 16 ("nanopublications") things really get interesting. Well worth the 40 minute investment. Towards a Machine-Actionable Scholarly Communication System View more presentations from Herbert Van de Sompel --Michael

2011-07-05: JCDL 2011 Trip Report

Image
JCDL 2011 ( #jcdl2011 ) was held June 13–16 in Ottawa, Ontario, Canada. The weather was beautiful and the conference sessions wonderful. The ODU Web Sciences and Digital Libraries team was fortunate enough to have six of its members attend, present three short papers, and demonstrate the Synchronicity Firefox extension. Our Contributions to JCDL 2011 Ahmed Alsum presented How Much of the Web is Archived? This paper approximates the amount of the Web that is archived using four URI sources. From this data, we observe significant variation in archival rate in URIs from different sources. So, how much of the web is archived? It depends on which web you mean. ( pdf , slides ). How Much of the Web is Archived? JCDL 2011 from Ahmed AlSum Martin Klein presented Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures , which details a method for discovering missing web pages (the dreaded 404 ). Martin also demonstrated Synchronicity , a Firefox ...