Showing posts from July, 2009

2009-07-30: Position Paper Published in Educause Review

The July/August 2009 issue of Educause Review has a position paper of mine entitled " Data Driven Science: A New Paradigm? " This invited paper is essentially a cleaned-up version of my position paper at the 2007 NSF/JISC Workshop on Data-Driven Science and Scholarship held in Arizona, April 17-19 2007. Prior to the workshop, we were all assigned topics on which we were to write a short position paper . My topic was to address the question of is "data-driven science is becoming a new scientific paradigm – ranking with theory, experimentation, and computational science?" You can judge my response by the original paper's more cheeky title of "I Don't Know and I Don't Care". My argument can be summed up as "we've always had data-driven science at whatever was the largest feasible scale; it just happens that the scale is now very large." Scale is important, in fact some days I might argue that scale is all there is. But part

2009-07-17: Technical Report "Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure"

This week I uploaded the technical report which is co-authored by Michael L. Nelson to the e-print service . The underlying idea of this research is to utilize the web infrastructure (search engines, their caches, the Internet Archive, etc) to rediscover missing web pages - pages that return the 404 "Page not Found" error. We apply various methods to generate search engine queries based on the content of the web page and user created annotations about the page. We then compare the retrieval performance of all methods and introduce a framework to combine such methods to achieve the optimal retrieval performance. The applied methods are: 5- and 7-term lexical signatures of the page the title of the page tags users annotated the page with on 5- and 7-term lexical signatures of the page neighborhood (up to 50 pages linking to the missing page) We query the big three search engines (Google, Yahoo and MSN Live) with the outcome of all methods and analyze t

2009-07-16: The July issue of D-Lib Magazine has JCDL and InDP reports.

The July/August 2009 issue of D-Lib Magazine has just published reports for the 2009 ACM/IEEE JCDL (written by me) and InDP (written by Frank and his co-organizers), as well as several other reports for JCDL workshops and other conferences (such as Open Repositories 2009 ). Whereas my previous entry about JCDL & InDP was focused on our group's experiences, these reports give a broader summary of the events. --Michael

2009-07-07: Hypertext 2009

From June 30th through July 1st I attended Hypertext 2009 ( HT 2009 ) in Torino Italy . The conference saw a 70% increase in submissions (117 total) compared to last year but due to the equally increased number of accepted papers (26 long and 11 short) and posters maintain last years acceptance rate of roughly 32%. HT 2009 also had a record of 150 registered attendees. I presented our paper titled " Comparing the Performance of US College Football Teams in the Web and on the Field " ( DOI ) which was joint work with Olena Hunsicker under the supervision of Michael L. Nelson . The paper describes an extensive study on the correlation of expert rankings of real world entities and search engine rankings of their representative resources on the web. Comparing the Performance of US College Football Teams in the Web and on the Field from Martin Klein We published a poster, " Correlation of Music Charts and Search Engine Rankings " ( DOI ), with the resu