2017-07-24: Replacing Heritrix with Chrome in WAIL, and the release of node-warc, node-cdxj, and Squidwarc

I have written posts detailing how an archives modifications made to the JavaScript of a web page being replayed collided with the JavaScript libraries used by the page and how JavaScript + CORS is a deadly combination during replay . Today I am here to announce the release of a suite of high fidelity web archiving tools that help to mitigate the problems surrounding web archiving and a dynamic JavaScript powered web.To demonstrate this, consider the image above: the left-hand screen shot shows today's cnn.com archived and replayed in WAIL, whereas the right-hand screen shot shows cnn.com in the Internet Archive on 2017-07-24T16:00:02 . In this post, I will be covering: Updates to WAIL Release of node-warc Release of node-cdxj Release of Squidwarc WAIL Let me begin by announcing that WAIL has transitioned away from using Heritrix as the primary preservation method . Instead, WAIL now directly uses a full Chrome browser (Electron provided) as the pres