Posts

Showing posts with the label On-Demand

2017-12-11: Difficulties in timestamping archived web pages

Image
Figure 1: A web page from nasa.gov is archived  by Michael's Evil Wayback in July 2017. Figure 2: When visiting the same archived page in October 2017, we found that the content of the page has been tampered with.   The 2016 Survey of Web Archiving in the United States shows an increasing trend of using public and private web archives in addition to the Internet Archive (IA). Because of this tendency we should consider the question of validity of archived web pages deleivered by these archives.  Let us look at an example where the important web page https://climate.nasa.gov/vital-signs/carbon-dioxide/ , that keeps a record of the carbon dioxide (CO2) level in the Earth’s atmosphere, is captured by a private web archive “Michael’s Evil Wayback” on July 17, 2017 at 18:51 GMT. At this time, as Figure 1 shows, the CO2 was 406.31 ppm. When revisiting the same archived page in October 2017, we should be presented with the same content. Surpris...

2017-02-22: Archive Now (archivenow): A Python Library to Integrate On-Demand Archives

Image
Examples: Archive Now (archivenow) CLI A small part of my research is to ensure that certain web pages are preserved in public web archives to hopefully be available and retrievable whenever needed at any time in the future. As archivists believe that "lots of copies keep stuff safe", I have created a Python library ( Archive Now ) to push web resources into several on-demand archives, such as The Internet Archive , WebCite , Perma.cc , and Archive.is . For any reason, one archive stops serving temporarily or permanently, it is likely that copies can be fetched from other archives. By Archive Now , one command like:     $ archivenow --all www.cnn.com is sufficient for the current CNN homepage to be captured and preserved by all configured archives in this Python library. Archive Now allows you to accomplish the following major tasks: A web page can be pushed into one archive A web page can be pushed into multiple archives A web page can be pushed into all archi...