Posts

2016-04-19: IIPC General Assembly 2016 Trip Report

9°C & wind 0 m/s! Our host @Landsbokasafn & @kristsi seem to have booked an early spring for #iipcga16 . Thank you! pic.twitter.com/jvyORkzsRo — IIPC (@NetPreserve) April 12, 2016 Our host @kristsi opening #iipcGA16 pic.twitter.com/YEWPnJPIRI — IIPC (@NetPreserve) April 11, 2016 The 2016 IIPC General Assembly and the separate-but-related IIPC Web Archiving Conference 2016 were held in Reykjavík, Iceland, April 11-15, with the former being open to IIPC members only and the latter open to the public.  Unfortunately, my trip report will be incomplete since I had to leave midday on Wednesday.  The first day was primarily given to IIPC business: introducing the new officers, covering project status, budgets, new bylaws , etc.   Jason gave a brief overview of our IIPC-funded Web Archive Profiling Via Sampling Project , which is now coming to a close.  In addition to the resources and deliverables linked from the IIPC project page, Sawood Alam has developed the Me

2016-04-17: A Summary of "What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalysts for Collective Memory in Wikipedia"

Image
Authors Nattiya Kanhabua , Ngoc Tu Nguyen , and Claudia Niederée from L3S published the following study at JCDL 2014 . In the process of reviewing possible topics for my PhD research,  I share my analysis of their findings. The full citation and presentation for the paper is below. Kanhabua, N ., Nguyen, T. N. , & Niederee, C. (2014, September). What triggers human remembering of events?: a large-scale analysis of catalysts for collective memory in Wikipedia . In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 341-350). IEEE Press. What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalysts for Collective Memory in Wikipedia from Nattiya Kanhabua The focus of the article centers around identifying patterns that trigger recollection of events in collective memory . Since the number of categorical events is limitless, the authors focus on natural and man-made disasters , accidents, and terrorism . Their analysis

2016-04-15: How I learned not to work full-time and get a PhD

Image
ODU's commencement on May 7th marks the last day of my academic career as a student. I began my career at ODU in the Fall of 2004, graduated with my BS in CS in the Spring of 2008 at which point I immediately began my Master's work under Dr. Levinstein . I completed my MS in Spring 2010, spent the summer with June Wright (now June Brunelle), and started my Ph.D. under Dr. Nelson in the Fall of 2010 (which is referred to as the Great Bait-and-Switch in our family). I will finish in the Spring of 2016 only to return as an adjunct instruction teaching CS418/518 at ODU in the Fall of 2016. On February 5th, I defended my dissertation " Scripts in a Frame: A Framework for Archiving Deferred Representations " (above picture courtesy Dr. Danette Allen , video courtesy of Mat Kelly ). My research in the WS-DL group focused on understanding, measuring, and mitigating the impacts of client-side technologies like JavaScript on the archives. In short, we showed that JavaS

2016-04-05: CNI Spring 2016 Trip Report

Image
The CNI Spring 2016 Members Meeting was held in San Antonio, TX, April 4-5, 2016.  As usual, the presentations were excellent but with six or more simultaneous sessions you are forced to make hard choices about what to catch up on. This year Martin Halbert and Katherine Skinner arranged the " Digital Preservation of Federal Information Summit ", convening 30+ people to discuss "...the topic of preservation and access to at-risk digital government information."  It was quite the collaborative exercise, and I know Martin produced some summary slides that I will link here when they are posted.  There were only a few presentations (and they were done in Pecha Kucha format) for this Summit, and I was fortunate enough to give one for Herbert and I entitled "Why We Need Multiple Archives".  The answer is probably pretty obvious for the crowd that Martin assembled, but we often run into people that don't understand the role of archives beyond that of t

2016-03-22: Language Detection: Where to start?

Image
Language detection is not a simple task, and no method results in 100% accuracy. You can find different packages online to detect different languages. I have used some methods and tools to detect the language of either websites or some texts. Here is a review of methods I came across during working on my JCDL 2015 paper, How Well are Arabic Websites Archived? . Here I discuss detecting a webpage's language using the HTTP language header and the HTML language tag. In addition, I reviewed several language detection packages, including Guess-Language , Python-Language Detector , LangID and Google Language Detection API . And since Python is my favorite coding language I searched for tools that were written in Python. I found that a primary way to detect the language of a webpage is to use the HTTP language header and the HTML language tag. However, only a small percentage of pages include the language tag and sometimes the detected language is affected by the browser setti

2016-03-07: Archives Unleashed Web Archive Hackathon Trip Report (#hackarchives)

Image
The Thomas Fisher Rare Book Library  (University of Toronto) Between March 3 - March 5, 2016, Librarians, Archivists, Historians, Computer Scientists, etc., came together for the Archives Unleashed Web Archive Hackathon at the University of Toronto Robarts Library, Toronto, Ontario Canada. This event gave researchers the opportunity to collaboratively develop open-source tools for web archives. The event was organized by Ian Milligan , (assistant professor of Canadian and digital history in the Department of History at the University of Waterloo), Nathalie Casemajor (assistant professor in communication studies in the Department of Social Sciences at the University of Québec in Outaouais (Canada)), Jimmy Lin (the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo), Matthew Weber (Assistant Professor in the School of Communication and Information at Rutgers University), and Nicholas Worby (the Government Information &