2015-06-26: PhantomJS+VisualEvent or Selenium for Web Archiving?

My research and niche within the WS-DL research group focuses on understanding how the adoption of JavaScript and Ajax is impacting our archives. I leave the details as an exercise to the reader ( D-Lib Magazine 2013 , TPDL2013 , JCDL2014 , IJDL2015 ), but the proverbial bumper sticker is that JavaScript makes archiving more difficult because the traditional archival tools are not equipped to execute JavaScript. For example, Heritrix (the Internet Archive 's automatic archival crawler) executes HTTP GET requests for archival target URIs on its frontier and archives the HTTP response headers and the content returned from the server when the URI is dereferenced. Heritrix "peeks" into embedded JavaScript and extracts any URIs it can discover, but does not execute any client-side scripts. As such, Heritrix will miss any URIs constructed in the JavaScript or any embedded resources loaded via Ajax. For example, the Kelly Blue Book Car Values website (Figure 1) uses