Posts

2016-10-23: Institutional Repositories, OAI-PMH, and Anonymous FTP

Image
Richard Poynder 's recent blog post " Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository? " has generated a lot of discussion, including a second post from Richard to address the comments and the always insightful commentary from David Rosenthal (" Why Did Institutional Repositories Fail? ").  There surely have been enough articles about institutional repositories to fill an institutional repository, but of particular interest to me are discussions about the technical and aspirational goals of OAI-PMH . A year ago Herbert and I reflected on OAI-PMH and other projects (" Reminiscing About 15 Years of Interoperability Efforts "), which I wish Richard would have referenced in his discussion (although Cliff does allude to this in his interview (MLN edit: Richard points out that I missed his quoting of that paper in his second blog post )), as well as the original SFC and UPS papers.  For his response to Richard,

2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016)

Image
Dodging the Memory Hole 2016 , held at UCLA's Charles Young Research Library in Los Angeles California, was a two-day event to discuss and highlight potential solutions to the issue of preserving born-digital news. Organized by Edward McCain  (digital curator of journalism at the Donald W. Reynolds Journalism Institute and University of Missouri Libraries) this event brought together technologists, archivists, librarians, journalists and fourteen graduate students who had won travel scholarships for attendance.  Among the attendees were four members of the WS-DL  group (l-r): Mat Kelly ,  John Berlin ,  Dr. Michael Nelson , and  Shawn Jones . The event was made possible by support from the  Reynolds Journalism Institute ,  Journalism Digital News Archive (JDNA) ,  UCLA Library ,  the Educopia  Institute   and the Institute of Museum and Library Services  (IMLS) . Day 1 (October 13, 2016) Day one started off at 9am with Edward McCain welcoming everyone to

2016-10-03: Which States and Topics did the Two Presidential Candidates Mention?

Image
"Team Turtle" in Archive Unleashed in Washington DC (from left to right: N. Chah, S. Marti, M. Aturban , and I. Amin) The first presidential debate (H. Clinton v. D. Trump) took place on last Monday, September 26, 2016 at Hofstra University , New York. The questions were about topics like economy, taxes, jobs, and race. During the debate, the candidates mentioned those topics (and other issues) and, in many cases, they associated a topic with a particular place or a US state (e.g., shootings in Chicago, Illinois, and crime rate in New York). This reminded me about the work that we had done in the second Archives Unleashed Hackathon , held at the Library of Congress in Washington DC. I worked with the "Team Turtle" ( Niel Chah , Steve Marti , Mohamed Aturban , and Imaduddin Amin ) on analyzing an archived collection, provided by the Library of Congress, about the 2004 Presidential Election (G. Bush v. J. Kerry). The collection contained hundreds of archived w

2016-10-03: Summary of “Finding Pages on the Unarchived Web"

Image
by: Hugo C. Huurdeman , Anat Ben-David , Jaap Kamps , Thaer Samar , and Arjen P. de Vries Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries 2014 In this paper , the authors detailed their approach to recover the unarchived Web based on links and anchors of crawled pages. The data used was from the Dutch 2012 Web archive at the National Library of the Netherlands (KB) , totaling about 38 million webpages. The collection was selected by the library based on categories related to Dutch history, social and cultural heritage. Each website is categorized using UNESCO code . The authors try to address three research questions: Can we recover a significant fraction of unarchived pages?, How rich are the representations for the unarchived pages?, and Are these representations rich enough to characterize the content? The link extraction used Hadoop MapReduce and Apache Pig to process all archived webpages and used JSoup to extract links from their content.

2016-09-27: Introducing Web Archiving in the Summer Workshop

Image
For the last few years the Department of Computer Science at Old Dominion University invites a group of undergrad students from India and hosts them in the summer. They work closely with a research group on some relevant projects. Additionally, researchers from different research groups in the departments present their work to the guest students twice a week and introduce various different projects that they are working on. The goal of this practice is to allow them to collaborate with graduate students of the department and to encourage them for research studies. The invited students also act as ambassadors to share their experience with their colleagues and spread the word out when they go back to India. This year a group of 16 students from Acharya Institute of Technology and B.N.M. Institute of Technology  visited Old Dominion University , they were hosted under the supervision of Ajay Gupta . They worked in the areas of Sensor Networks and Mobil Application Development. The

2016-09-26: IIPC Building Better Crawlers Hackathon Trip Report

Image
Trip Report for the IIPC Building Better Crawlers Hackathon in London, UK.                            On September 22-23, 2016, I attended the IIPC Building Better Crawlers Hackathon ( #iipchack ) at the British Library in London, UK. Having been to London almost exactly 2 years ago for the Digital Libraries 2014 conference , I was excited to go back, but was more so anticipating collaborating with some folks I had long been in contact with during my tenure as a PhD student researcher at ODU. The event was a well-organized yet loosely scheduled meeting that resembled more of an "Unconference" than a Hackathon in that the discussion topics were defined as the event progressed rather than a larger portion being devoted to implementation (see the recent Archives Unleashed 1.0 and 2.0 trip reports ). The represented organizations were: British Library Bilbiotheque nationale de France (BnF) Internet Archive National Library of Denmark State and University Library of