Monday, December 11, 2017

2017-12-11: Difficulties in timestamping archived web pages

Figure 1: A web page from is archived
 by Michael's Evil Wayback in July 2017.
Figure 2: When visiting the same archived page in October 2017,
we found that the content of the page has been tampered with. 
The 2016 Survey of Web Archiving in the United States shows an increasing trend of using public and private web archives in addition to the Internet Archive (IA). Because of this tendency we should consider the question of validity of archived web pages deleivered by these archives. 
Let us look at an example where the important web page, that keeps a record of the carbon dioxide (CO2) level in the Earth’s atmosphere, is captured by a private web archive “Michael’s Evil Wayback” on July 17, 2017 at 18:51 GMT. At this time, as Figure 1 shows, the CO2 was 406.31 ppm.
When revisiting the same archived page in October 2017, we should be presented with the same content. Surprisingly, CO2 changed and became 270.31 ppm as Figure 2 shows. So which one is the “real” archived archived page?
We can simply detect that the content of an archived web page has been modified by generating a cryptographic hash value on the returned HTML code. For example, the following command will download the web page and generate a SHA-256 hash value on its HTML content
$ curl -s | shasum -a 256
b87320c612905c17d1f05ffb2f9401ef45a6727ed6c80703b00240a209c3e828  -
The next figure illustrates how the simple approach of generating hashes can detect any tampering with content of archived pages. In this example, the "black hat" in the figure (i.e., Michael’s Evil Wayback) has changed the CO2 to a lower value (i.e., in favor of individuals or organizations who deny that CO2 is one of the main causes of global warming).  
Another possible solution to validate archived web pages is to use timestamping. If a trusted timestamp is issued on an archived web page, anyone should verify that a particular representation of the web page has existed in a specific time in the past.
As of today, many systems, such as OriginStamp and OpenTimestamps offer a free-of-charge service to generate blockchain-based trusted timestamps of digital documents, such as Bitcoin. These tools perform multiple steps to successfully create timestamps. One of these steps requires computing a hash value which represents the content of the resource (i.e, by the cURL command above). Next, this hash value is converted to a Bitcoin's address, then a Bitcoin's transaction is made where one of the two sides of the transaction (i.e., the source and destination) should be the new generated address. Once approved by the blockchain, the transaction creation datetime is considered to be a trusted timestamp. Shawn Jones describes in "Trusted Timestamping of Mementos" how to create trusted timestamp of archived web pages using blockchain networks.
In our technical report "Difficulties of Timestamping Archived Web Pages", we show that trusted timestamping archived web pages is not an easy task for several reasons. The main reason is that a hash value calculated on the content of  an archived web page (i.e., memento) should be repeatable. That is we should always obtain the same hash value each time we retrieve the memento. In addition to those difficulties, we introduced some requirements to be fulfilled in order to generate repeatable hash values of mementos.

--Mohamed Aturban

Mohamed Aturban, Michael L. Nelson, Michele C. Weigle, "Difficulties of Timestamping Archived Web Pages." 2017. Technical Report. arXiv:1712.03140.

1 comment:

  1. The history of finger biometry was initiated in the late nineteenth century by scientist Francis Galton. Since then, it has grown tremendously thanks to a large team of geneticists and biologists. In 1880, Henry Faulds made the argument for the amount of fingerprint RC (Ridge Count) to assess the degree of fingerprint dependence on the genes.

    The scientists claim that fingerprints are formed under the influence of the genetic system of the fetus inherited and the impact of the environment through the vascular system and the nervous system located between the dermis and the expression the cover. Some of these effects are oxygen supply, nerve formation, the distribution of sweat glands, the development of epithelial cells. Interestingly, although there is a common genetic system Hereditary but fingerprints on the ten fingers of each individual individual. In 1868 the scholar Roberts pointed out that each finger had a different micro-growth environment; In addition, the thumb and index finger suffers from some additional environmental effects. So fingerprints on the top ten fingers of a different individual. The twin brothers (sisters) with fingerprint eggs are quite similar but still can distinguish fingerprints of each person. This is because although they have the same genetic system and share the same developmental environment in the womb, but because of their different position in the womb, their micro environment is different and therefore has different fingerprints. together.

    See more at :

    sinh trắc vân tay hà nội
    Khám phá bản thân
    Trung tâm sinh trắc vân tay
    Khám phá bản thân