Posts

Showing posts with the label phantomjs

2017-11-22: Deploying the Memento-Damage Service

Image
Many web services such as  archive.is ,  Archive-It ,  Internet Archive , and  UK Web Archive  have provided archived web pages or mementos  for us to use. Nowadays, the web archivists have shifted their focus from how to make a good archive to measuring how well the archive preserved the page. It raises a question about how to objectively measure the damage of a memento that can correctly emulate user (human) perception. Related to this,  Justin Brunelle  devised a prototype for measuring the impact of missing embedded resources (the damage) on a web page. Brunelle, in his IJDL paper (and the earlier JCDL version), describes that the quality of a memento depends on the availability of its resources. The straight percentage of missing resources in a memento is not always a good indicator of how "damaged" it is. For example, one page could be missing several small icons whose absence users never even notice, and a second pag...

2015-11-06: iPRES2015 Trip Report

Image
From November 2nd through November 5th, Dr. Nelson , Dr. Weigle , and I attended the iPRES2015 conference at the University of North Carolina Chapel Hill . This served as a return visit for Drs. Nelson and Weigle; Dr. Nelson worked at UNC through a NASA fellowship and Dr. Weigle received her PhD from UNC. We also met with Martin Klein , a WS-DL alumnus now at the UCLA Library. While the last ODU contingent to visit UNC was not so lucky, we returned to Norfolk relatively unscathed. Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern 's keynote address delivered with Leo Konstantelos and Maureen Pennock . This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento...

2015-06-26: PhantomJS+VisualEvent or Selenium for Web Archiving?

Image
My research and niche within the WS-DL research group focuses on understanding how the adoption of JavaScript and Ajax is impacting our archives. I leave the details as an exercise to the reader ( D-Lib Magazine 2013 , TPDL2013 , JCDL2014 , IJDL2015 ), but the proverbial bumper sticker is that JavaScript makes archiving more difficult because the traditional archival tools are not equipped to execute JavaScript. For example,  Heritrix  (the  Internet Archive 's automatic archival crawler) executes HTTP GET requests for archival target URIs on its frontier and archives the HTTP response headers and the content returned from the server when the URI is dereferenced. Heritrix "peeks" into embedded JavaScript and extracts any URIs it can discover, but does not execute any client-side scripts. As such, Heritrix will miss any URIs constructed in the JavaScript or any embedded resources loaded via Ajax. For example, the Kelly Blue Book Car Values website (Figure 1) uses...