2017-07-24: Replacing Heritrix with Chrome in WAIL, and the release of node-warc, node-cdxj, and Squidwarc

I have written posts detailing how an archives modifications made to the JavaScript of a web page being replayed collided with the JavaScript libraries used by the page and how JavaScript + CORS is a deadly combination during replay . Today I am here to announce the release of a suite of high fidelity web archiving tools that help to mitigate the problems surrounding web archiving and a dynamic JavaScript powered web.To demonstrate this, consider the image above: the left-hand screen shot shows today's archived and replayed in WAIL, whereas the right-hand screen shot shows in the Internet Archive on 2017-07-24T16:00:02 . In this post, I will be covering: Updates to WAIL Release of node-warc Release of node-cdxj Release of Squidwarc WAIL Let me begin by announcing that WAIL has transitioned away from using Heritrix as the primary preservation method . Instead, WAIL now directly uses a full Chrome browser (Electron provided) as the pres

2017-07-19: Archives Unleashed 4.0: Web Archive Datathon Trip Report

They : Hey Sawood , nice to see you again. Me : Hi, I am glad to see you too. They : Did you attend all hackathons, I mean datathons? Me : Yes, I attended all of the four Archives Unleashed events so far. They : How did you like it? Me : Well, there is a reason why I attended all of them, despite being a seemingly busy PhD researcher. They : So, what is your research about? Me : I am trying to profile various web archives to build a high-level understanding of their holdings, primarily, for the sake of efficiently routing Memento aggregation requests, but there can be many more use cases of such profiles... [and the conversation continues...] On day zero of Archives Unleashed 4.0 in London, conversations among many familiar and unfamiliar faces started with travel and lodging related questions, but soon emerged into mass storage challenges, scaling issues, quality and coverage of web archives, long-term maintenance of archival tools, documentation and d

2017-07-06: Web Science 2017 Trip Report

I was fortunate enough to have the opportunity to present Yasmin AlNoamany 's work at Web Science 2017 . Dr. Nelson offers an excellent class on Web Science , but it has been years since I had taken it and I still was uncertain about the current state of the art. Web Science 2017 took place in Troy, a small city in upstate New York that is home to Rensselaer Polytechnic Institute (RPI) . The RPI team had organized an excellent conference focused on a variety of Web Science topics, including cyber bullying, taxonomies, social media, and ethics. Keynote Speakers Day One The opening keynote by Steffen Staab from the Institute for Web Science and Technologies (WeST) was entitled "The Web We Want". He discussed how we need to determine what values we want to meet before deciding on the web we want. Dr. Staab defined three key values: accessibility for the disabled, freedom from harassment, and a useful semantic web. Staab detailed the MAMEM project wh

2017-07-04: Web Archiving and Digital Libraries (WADL) Workshop Trip Report From JCDL2017

Web Archiving and Digital Libraries Workshop was held after JCDL 2017 from June 6, 2017, to June 23, 2017. I live-tweeted both days and you can follow along on Twitter with this blog post using the hashtag wadl2017 or via the notes/minutes of WADL2017 . I also created a list on Twitter of the speaker/presenters Twitter handles, go give them a follow to keep up to date with their exciting work. Day 1 (June 22) WADL2017 kicked off at 2 pm with Martin Klein and Edward Fox welcoming us to the event by giving an overview and introduction to the presenters and panelists. @mart1nkle1n kicks off a #JCDL2017 attached session by scrbblinging #WADL2017 hashtag on the blackboard. — Sawood Alam (@ibnesayeed) June 22, 2017 Keynote The opening keynote of WADL2017 was National Digital Platform (NDP) , Funding Opportunities, and Examples Of Currently Funded Projects by Ashley Sands ( IMLS ). @as