Posts

2012-07-05: Exploring the WAC: Challenges in Providing Access to the World's Web Archives

Image
The Web Archive Cooperative (WAC) held its 2012 Summer Workshop June 29–30 at Stanford University Palo Alto, California. The workshop focused on the challenges (and some solutions) of providing easy access to the World’s web archives. The WS-DL Research Group had six members in attendance. Memento and Source Code Repositories — Harihar Shankar (LANL)  Memento allows temporal access to web resources using datetime. Version control services such as GitHub also allow temporal access, but using a version number instead of datetime. Harihar Shankar of the Los Alamos National Laboratory (LANL) Research Library presented Memento and a Memento/GitHub proxy prototyped at LANL. The proxy enables access to GitHub projects through datetime. For many use cases, datetime is much simpler that Git ’s 25-hex-character commit id. A Research Agenda for “Obsolete Data or Resources” — Michael Nelson (ODU) Old Dominion University’s Michael Nelson presented WAC’s research agenda for obso

2012-07-05: Web Crawler Animation

Image
In this post I'm revisiting a publication from the pre-blog era that has really cool animations.  Most of my work is at the protocol and architecture level (e.g., PMH , ORE , Memento , ResourceSync ) and while I enjoy that, it does leave me with a serious case of visualization envy that was made worse by attending a Tufte lecture ca. 2004.  While we don't have anything close to Minard's " Napoleon's March to Moscow ", we do have a couple of things of which I'm especially proud. One of things I find myself showing to people every month or two are Joan Smith 's animations of web crawlers visiting a series of synthetic web sites over the course of a year (February 2007 -- February 2008).  Joan's dissertation was on the topic of web servers assisting the task of digital preservation, both by enumerating the valid URIs at a web site and by providing preservation metadata about the resource representations at the web site.  One of the sub-question

2012-07-05: R Package Recommendation

Image
 Much of my research is focused on data-mining the Collective Intelligence of the Internet to see if any type of group intelligence emerges out of vast amount of data present on the Web. Sifting through even a portion of the data available is a daunting task and I frequently rely on Python to handle most of the heavy lifting. Over the past year I have been attempting to increase my use of R and many of the interesting packages available on CRAN . In my quest to become more proficient in R and to use it in more of my research, I am continually experimenting with new and interesting code and data examples. Of particular interest to me are data mashups where an example of real-world data is collected, some form of intelligence is  extracted via data-mining or machine learning and then an informative graphic is produced that shares the information obtained from the data. A recently published book that grabbed my attention is Machine Learning for Hackers . This book is a down to ear

2012-06-17: JCDL 2012 Conference

Image
On Saturday, my colleague, Justin Brunelle , and I took off on a road trip to attend this year’s JCDL conference in Washington, D.C. We arrived at the nation’s capital earlier that evening and began preparing our presentations after settling in at the George Washington University Inn . Both of us were accepted to present our work at the conference’s Doctoral Consortium . Justin has already blogged about the consortium and our experience in his brilliant blog post . The conference started on the following Monday (June 10 th ). The registration went smoothly and we all took our seats at the Betts Theatre in the Marvin Center which sits in the heart of George Washington University . Barrie Howard and Karim Boughida (the conference co-chairs) gave the welcoming remarks and were followed by Leo M. Chalupa , the Vice President of Research at the university. Michael Nelson , opened up the session and introduced the keynote speaker, Jason Scott . Winning the award for the Most “

2012-06-12: JCDL 2012 Doctoral Consortium

Image
The ODU WS-DL research group kicked off  JCDL 2012  at  The George Washington University  by presenting the first two  Doctoral Consortium  papers on June 10th, 2012. The Doctoral Consortium is a workshop for PhD students that are in the early stages of defining their research. It is a venue for presenting a potential path through the PhD, as well as a way to receive feedback from peers and other researchers. Past WS-DL students have benefited from the workshop, including Joan Smith , Frank McCown , Martin Klein , Chuck Cartledge , and Ahmed Alsum . Hany SalahEldeen and I ( Justin F. Brunelle ) were honored and excited to be the next class of WS-DL students to participate. The first session was the Data Preservation and Curation section, chaired by Maristella Agosti . I presented the first paper entitled " Filling in the Blanks: Capturing Dynamically Generated Content ". My work will study capturing, sharing, and archiving Web 2.0 resources that traditional crawlers canno

2012-06-04: Glue Conference 2012

Image
Glue Conference 2012  took place at the Omni Interlocken Hotel Bloomfield, CO on May 23 and 24th. Gluecon is an information packed developer conference that focuses on cloud, mobile, APIs, big data, and most importantly, developers. Some of the topics included NoSQL, node.js, HTML5, backend-as-a-service, cloud management and security, cloud storage, Hadoop, DevOps, mobile app development, and cloud platforms. I attended the conference with sponsorship (full ride) from  FullContact .  These guys were unbelievably gracious and showed me a great time while I was out there.  I came in contact with them when  Bart Lorang , CEO of FullContact contacted me over e-mail and wanted to setup a time to talk with him and his engineering team about a  paper  I had  published  at a  KDD'11 workshop .  After meeting with the guys and talking shop, I found out that they are solving the same real world problems (at world scale) that I was working on in my graduate research (at individual scale

2012-04-30: IIPC 2012 GA, A Week with Archivists!

Image
The International Internet Preservation Consortium ( IIPC ) held its annual general assembly meeting for 2012 from Apr 30 to May 4, 2012, in the Library of Congress in Washington D.C. I concluded this report based on Tweets using #IIPC12 from the meeting and my personal notes. In this report, I tried to assess how much the tweets about an event could give you a complete view about it. More details about the new approach will be in the blog comment. The first day, April 30, 2012 was open to the public. It was entitled "The Broad Value of Web Archives: Demonstrated Use", @hhockx: : #iipc12 opened just now by Martha Anderson . Laura Campbell welcomes the participants. @netpreserve: IIPC starts with 11 members, now has 42. @gregorylisa: Apropos quote at #IIPC12 "The great use of life is to spend it for something that will outlast it." William James.   @netpreserve: Gildas Illien from BnF sets the stage for researcher use case panel. @cleymour: Gildas Il