Showing posts from September, 2016

2016-09-27: Introducing Web Archiving in the Summer Workshop

For the last few years the Department of Computer Science at Old Dominion University invites a group of undergrad students from India and hosts them in the summer. They work closely with a research group on some relevant projects. Additionally, researchers from different research groups in the departments present their work to the guest students twice a week and introduce various different projects that they are working on. The goal of this practice is to allow them to collaborate with graduate students of the department and to encourage them for research studies. The invited students also act as ambassadors to share their experience with their colleagues and spread the word out when they go back to India.

This year a group of 16 students from Acharya Institute of Technology and B.N.M. Institute of Technology visited Old Dominion University, they were hosted under the supervision of Ajay Gupta. They worked in the areas of Sensor Networks and Mobil Application Development. They resear…

2016-09-26: IIPC Building Better Crawlers Hackathon Trip Report

Trip Report for the IIPC Building Better Crawlers Hackathon in London, UK.                           On September 22-23, 2016, I attended the IIPC Building Better Crawlers Hackathon (#iipchack) at the British Library in London, UK. Having been to London almost exactly 2 years ago for the Digital Libraries 2014 conference, I was excited to go back, but was more so anticipating collaborating with some folks I had long been in contact with during my tenure as a PhD student researcher at ODU.The event was a well-organized yet loosely scheduled meeting that resembled more of an "Unconference" than a Hackathon in that the discussion topics were defined as the event progressed rather than a larger portion being devoted to implementation (see the recent Archives Unleashed 1.0 and 2.0 trip reports). The represented organizations were: British LibraryBilbiotheque nationale de France (BnF)Internet ArchiveNational Library of DenmarkState and University Library of DenmarkIcelandic Web Ar…

2016-09-20: The promising scene at the end of Ph.D. trail

August 26th marked my last day as a Ph.D. student in the Computer Science department at ODU, while September 26 marks my first day as a Postdoctoral Scholar in Data Curation for the Sciences and Social Sciences at UC Berkeley. I will lead research in the areas of software curation, data science, and digital research methods. I will be honored to work under the supervision of Dr. Erik Mitchell, the Associate University Librarian and Director of Digital Initiatives and Collaborative Services at the University of California, Berkeley. I will have an opportunity to collaborate with many institutions across UC Berkeley, including the Berkeley Institute for Data Science (BIDS) research unit. It is amazing to see the light at the end of the long tunnel. Below, I talk about the long trail I took to reach my academic dream position. I'll recap the topic of my dissertation, then I'll summarize lessons learned at the end.

I started my Ph.D. in January 2011 at the same time that the upri…

2016-09-20: Carbon Dating the Web, version 3.0

Due to API changes, the old carbondate tool is out of date and some modules no longer work, such as topsy. I have taken up the responsibility of maintaining and extending  the service, beginning with the following now available in Carbon Date v3.0.

Carbon date 3.0 What's newNew services have been added, such as bing searching, twitter searching and pubdate parsing.

The new software architecture enable us to load given scripts or disable given services during runtime.

The server framework has been changed from CherryPy server to tornado server which is still a python minimalist WSGI server, with better performance.
How to use the Carbon Date serviceThrough the website, Given that carbon dating is computationally intensive, the site can only hold 50 concurrent requests, and thus the web service should be used just for small tests as a courtesy to other users. If you have the need to Carbon Date a large number of URLs, you should install the application…

2016-09-13: Memento and Web Archiving Colloquium at UVa

Yesterday, September 12, I went to the University of Virginia to give a colloquium at the invitation of Robin Ruggaber to talk with her staff about Memento, Web Archiving, and related technologies.  I also had the pleasure of meeting with Worthy Martin of the CS department and the Institute for Advanced Technology in the Humanities.  I met Robin at CNI Spring 2016 and she was intrigued by our work at using storytelling to summarize archival collections, and was hoping to apply it to their Archive-It collections (which are currently not public).  My presentation yesterday was more of an overview of web archiving,  although the discussion did cover various details, including a proposal for Memento versioning in Fedora

The Memento Protocol and Research Issues With Web Archiving from Michael Nelson


2016-09-11: Web Archiving in Popular Media

@jefferson_bail perhaps a panel idea for the @NetPreserve WAC? “2016: The Year Politics Drove People to Finally Use Web Archives" — Abbie Grotke (@agrotke) August 11, 2016
At the Old Dominion UniversityWeb Science and Digital Libraries Research Group we have been studying web archiving for a long time.  In the past few years, we have noticed a significant uptick in the use of web archives in mainstream media, both to support stories and as the subject.  This post presents articles from the popular media that use web archive holdings (mementos) as evidence and concludes with articles about web archives.

Articles that Reference Web Archives
'Fake News' And How The Washington Post Rewrote Its Story On Russian Hacking Of The Power Grid What the Washington Post's rush to be the first to report on Russian hackers breaching the US power grid teaches us about how "breaking news" can all too often become "fake news" when we over-trust government sources and …

2016-09-09: Summer Fellowship at the Harvard Library Innovation Lab Trip Report

I was honored with the great opportunity of collaborating with the Harvard Library Innovation Lab (LIL) as a Fellow this Summer. Located at Langdell Hall, Harvard Law School, the Library Innovation Lab develops solutions to solve serious problems facing libraries. It consists of an eclectic group of Lawyers, Librarians, and Software Developers engaged in projects such as, Caselaw Access Project (CAP), The Nuremberg Project, among many others.  To help prevent link rot, creates permanent reliable links for web resources. The Caselaw Access Project is an ambitious project which strives to make all US case laws freely accessible online. The current collection to be digitized stands at over 42,000 volumes (nearly 40 million pages). The Nuremberg Project is concerned with the digitization of LIL's collection about the Nuremberg trials.  How Harvard digitized nearly 40 million pages of case law: — WBUR (@WBUR) August 3…