2020-03-26: Memento Compliance Audit of PyWB

This document is an audit report of the latest development version of PyWB, a Web archive replay sytem, for its Memento (RFC 7089) compliance. As a growing number of public Web archives are moving towards deploying PyWB, it becomes critical to comply with standards to ensure that tools in the archiving ecosystem continue to function as expected.
To audit the Memento compliance of PyWB I established the following setup:
  • Captured example.com five times in separate WARC files with the gap of a few minutes each using warcio
  • Created various test instances of PyWB's develop branch, which is one commit ahead of the v-2.4.0-rc6-test version (commit hash: 92e459bda52a2b03f33a4b0b8094ed424248d2a5)
  • Initialized a collection named example and loaded freshly captured warc files in it for replay
  • Placed multiple custom configuration files that are loaded by setting PYWB_CONFIG_FILE environment variable for each test instance
  • Preserved the state of the relevant folder tree in pywbtest.tar.gz for replication and reproducibility
  • Made various tests instance publicly accessible at:
Notes:
  • The keywords MUST and MUST NOT indicate strict compliance issues, while SHOULD and SHOULD NOT suggest established practices and recommendations that some existing tools may rely on
  • These testing services run behind a reverse-proxy, which is responsible for HTTPS and HTTP/2, not the PyWB, so --http1.1 flag will be used in curl commands below to enforce HTTP/1.1
  • Commands below can be clicked to toggle the visibility of their outputs or can be run in a terminal while the testing services remain live
  • Each command is prefixed with a counter in the form of [C<NUMBER>] to allow referencing
tree pywbtest
pywbtest
├── collections
│   └── example
│       ├── archive
│       │   ├── example-20200323133704.warc.gz
│       │   ├── example-20200323133917.warc.gz
│       │   ├── example-20200323134145.warc.gz
│       │   ├── example-20200323134509.warc.gz
│       │   └── example-20200323134606.warc.gz
│       ├── indexes
│       │   └── index.cdxj
│       ├── static
│       └── templates
├── config-nofr-tgrd.yaml
├── config-nofr.yaml
├── config-tgrd.yaml
├── static
└── templates

8 directories, 6 files

Table of Contents

  1. Default Mode
    1. Banner Memento
    2. TimeMap
    3. Main Page Memento
    4. TimeGate
  2. No Frame Replay Mode
    1. Memento
    2. TimeMap
    3. TimeGate
  3. TimeGate Redirect Mode
    1. Banner Memento
    2. Main Page Memento
    3. TimeMap
    4. TimeGate
  4. No Frame Replay With TimeGate Redirect Mode
    1. Memento
    2. TimeMap
    3. TimeGate
  5. Summary
  6. Acknowledgements

1. Default Mode

In this mode we do not use any custom configuration file.

1.1. Banner Memento

After navigating via the Web UI I selected the third of the five mementos and inspected it using cURL:
curl -iL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1573
Content-Type: text/html
Date: Mon, 23 Mar 2020 17:44:17 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT

<!DOCTYPE html>
<html>
<head>
<style>
html, body
{
  height: 100%;
  margin: 0px;
  padding: 0px;
  border: 0px;
  overflow: hidden;
}

</style>
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/wb_frame.js'> </script>

<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.css'/>




</head>
<body style="margin: 0px; padding: 0px;">

<div id="wb_iframe_div">
<iframe id="replay_iframe" frameborder="0" seamless="seamless" scrolling="yes" class="wb_iframe" allow="autoplay; fullscreen"></iframe>
</div>
<script>
  var cframe = new ContentFrame({"url": "https://example.com/" + window.location.hash,
                                 "prefix": "https://pywbtest.ws-dl.cs.odu.edu/example/",
                                 "request_ts": "20200323134145",
                                 "iframe": "#replay_iframe"});

</script>
</body>
</html>
This is the banner container that loads the main page memento inside an iframe. This exposes link relations timemap, timegate, and a single memento with mp_ suffix.
Now, let's make a request with the datetime one second earlier i.e., 20200323134144 instead of 20200323134145 for which there are no mementos at the exact moment.
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1573
Content-Type: text/html
Date: Mon, 23 Mar 2020 17:45:41 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:44 GMT"
Memento-Datetime: Mon, 23 Mar 2020 13:41:44 GMT
And another request to a domain name that does not exist in the archive:
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144/https://missing.example.com/
HTTP/1.1 200 OK
Content-Length: 1581
Content-Type: text/html
Date: Mon, 23 Mar 2020 17:46:47 GMT
Link: <https://missing.example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://missing.example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://missing.example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144mp_/https://missing.example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:44 GMT"
Memento-Datetime: Mon, 23 Mar 2020 13:41:44 GMT
These two requests SHOULD have returned 302 and 404 status codes respectively, but they both return 200. More importantly, they return Memento-Datetime headers corresponding to the datetime string in the request URI, which means a Memento client may wrongly assume these as actual mementos.

1.2. TimeMap

Now, let's fetch the TimeMap as reported in the Link header of the first request above (command C2).
curl -iL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1097
Content-Type: application/link-format
Date: Mon, 23 Mar 2020 17:48:10 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format"
Vary: accept-datetime

<https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="self"; type="application/link-format"; from="Mon, 23 Mar 2020 13:37:04 GMT",
<https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate",
<https://example.com/>; rel="original",
<https://pywbtest.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"; collection="example",
<https://pywbtest.ws-dl.cs.odu.edu/example/20200323133917mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:39:17 GMT"; collection="example",
<https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example",
<https://pywbtest.ws-dl.cs.odu.edu/example/20200323134509mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:45:09 GMT"; collection="example",
<https://pywbtest.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"; collection="example"
This has at least two issues:
  • It returns Vary: accept-datetime header, but datetime-based content negotiation on a TimeMap endpoint is not defined in Memento
  • Memento links in the response payload introduce a collection attribute that looks harmless, but such arbitrary attributes are not allowed by the Web Linking (RFC 5988), unless extended in another specification, if it is important then it can be incorporated as per the Item and Collection Link Relations (RFC 6573)

1.3. Main Page Memento

Now, let's fetch the middle memento entry from the reported TimeMap above (command C5), which is the main page memento, not the banner container.
curl -iL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Location: https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Transfer-Encoding: chunked

<!doctype html>
<html>
<head><!-- WB Insert -->
<script>
  wbinfo = {};
  wbinfo.top_url = "https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/";
  // Fast Top-Frame Redirect
  if (window == window.top && wbinfo.top_url) {
    var loc = window.location.href.replace(window.location.hash, "");
    loc = decodeURI(loc);
 
    if (loc != decodeURI(wbinfo.top_url)) {
        window.location.href = wbinfo.top_url + window.location.hash;
    }
  }
  wbinfo.url = "https://example.com/";
  wbinfo.timestamp = "20200323134145";
  wbinfo.request_ts = "20200323134145";
  wbinfo.prefix = decodeURI("https://pywbtest.ws-dl.cs.odu.edu/example/");
  wbinfo.mod = "mp_";
  wbinfo.is_framed = true;
  wbinfo.is_live = false;
  wbinfo.coll = "example";
  wbinfo.proxy_magic = "";
  wbinfo.static_prefix = "https://pywbtest.ws-dl.cs.odu.edu/static/";
  wbinfo.enable_auto_fetch = false;
</script>
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/wombat.js'> </script>
<script>
  wbinfo.wombat_ts = "20200323134145";
  wbinfo.wombat_sec = "1584970905";
  wbinfo.wombat_scheme = "https";
  wbinfo.wombat_host = "example.com";

  wbinfo.wombat_opts = {};

  if (window && window._WBWombatInit) {
    window._WBWombatInit(wbinfo);
  }
</script>


<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.css'/>




<!-- End WB Insert -->

    <title>Example Domain</title>

    <meta charset="utf-8"/>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1"/>
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
  • The value of the Memento-Datetime header is reported in the Date header as well (this behavior is found in many other places too where the main page memento is returned), though the two headers have different semantics
  • The Link header only reports one memento relation (i.e., the current memento)
    • While this is not mandatory to provide other memento relations such as first, prev, next, and last, but this has been the norm and tools were built with these expectations, for example, ReconstructiveBanner Custom Element relies on these values to enable navigational links
Now, let's make a request with the datetime one second earlier i.e., 20200323134144 instead of 20200323134145 for which there are no mementos at the exact moment.
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144mp_/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Location: https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Traditionally, we would expect a 302 redirect to the closest memento in this case, but PyWB returns the payload of the closest memento and indicates the actual URI-M in Content-Location header both when the request URI is an exact match (e.g., the earlier example) or the datetime is nearby. The Content-Location header is generally used where there is some content negotiation involved or a resource is created or updated, but I have not seen it being used in plain GET requests. While it can be argued that URI-Ms with datetimes different from the exact matches are a form of implicit content negotiation (albeit using a path parameter and not a header) and it avoids unnecessary round-trips, my concern here is the fact that too many URIs are pointing to the same resource. I do not think that the Content-Location header can be used to convey the canonical link relation. Also, the use of the Content-Location header suggests the user-agent to use the value of the header in the future in place of the request URI, which is problematic if a more appropriate (closer) memento is made available in the future. At least it should be made explicit using a Cache-control that this response is not cacheable. Many researchers were relying on the traditional behavior to identify a terminal memento by following any redirects until a Memento-Datetime header is found in the response, but this behavior will force them to reconsider their scripts.
Another issue in the Link header of both of the above requests (commands C26 and C7) is the inclusion of the non-standard collection attribute (same as the TimeMap payload discussed earlier in section 1.2).
Now, a request to a domain name that does not exist in the archive:
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/20200323134144mp_/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1084
Content-Type: text/html
Date: Mon, 23 Mar 2020 22:13:44 GMT
This looks good, in contrast the corresponding banner container URI-M (i.e., the one without mp_ suffix) returns 200 as illustrated earlier.

1.4. TimeGate

Now, let's interact with the TimeGate endpoint as reported in the Link headers earlier in many responses. Corresponding PyWB documentation claims that the behavior is consistent with the Memento Pattern 2.2.
curl -iL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1559
Content-Type: text/html
Date: Tue, 24 Mar 2020 00:15:36 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format"
Vary: accept-datetime

<!DOCTYPE html>
<html>
<head>
<style>
html, body
{
  height: 100%;
  margin: 0px;
  padding: 0px;
  border: 0px;
  overflow: hidden;
}

</style>
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/wb_frame.js'> </script>

<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest.ws-dl.cs.odu.edu/static/default_banner.css'/>




</head>
<body style="margin: 0px; padding: 0px;">

<div id="wb_iframe_div">
<iframe id="replay_iframe" frameborder="0" seamless="seamless" scrolling="yes" class="wb_iframe" allow="autoplay; fullscreen"></iframe>
</div>
<script>
  var cframe = new ContentFrame({"url": "https://example.com/" + window.location.hash,
                                 "prefix": "https://pywbtest.ws-dl.cs.odu.edu/example/",
                                 "request_ts": "",
                                 "iframe": "#replay_iframe"});

</script>
</body>
</html>
curl -IL --http1.1 -H "Accept-Datetime: Fri, 01 Jan 1999 12:34:56 GMT" https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1573
Content-Type: text/html
Date: Tue, 24 Mar 2020 00:15:09 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/19990101123456mp_/https://example.com/>; rel="memento"; datetime="Fri, 01 Jan 1999 12:34:56 GMT"
Memento-Datetime: Fri, 01 Jan 1999 12:34:56 GMT
Vary: accept-datetime
First of these two requests (without an explicit Accept-Datetime header) SHOULD return the most recent memento and the second one (with the explicit Accept-Datetime header) MUST resolve to the first memento as the requested datetime is far in the past, way before the very first capture of the URI-R in the test archive.
The first response does not include any memento relation in the Link header and fails to provide a Memento-Datetime header. The second response does include both of these, but the datetime value is an echoback of the requested Accept-Datetime value, and not datetime of the actual corresponding memento. Additionally, there are no navigational memento relations (as discussed earlier in section 1.3).
Requesting a non-existing resource returns 200 status along with the issues discussed above as shown below:
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/https://missing.example.com/
HTTP/1.1 200 OK
Content-Length: 1567
Content-Type: text/html
Date: Tue, 24 Mar 2020 00:41:22 GMT
Link: <https://missing.example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://missing.example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://missing.example.com/>; rel="timemap"; type="application/link-format"
Vary: accept-datetime
It turned out that this reported TimeGate endpoint belongs to the banner container, not the main page memento. Unfortunately, this is the only TimeGate endpoint that is discoverable from any other response, be it a TimeMap, banner memento, or main page memento.
Out of curiosity I tested a potential TimeGate endpoint with mp_ as a path parameter which turned out to be one that is compliant with the documented behavior. However, both the PyWB documentation and Link header in responses fail to acknowledge this.
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/mp_/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Location: https://pywbtest.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:46:06 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:46:06 GMT
Vary: accept-datetime
X-Archive-Orig-Age: 590951
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:46:06 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7FA7)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 -H "Accept-Datetime: Fri, 01 Jan 1999 12:34:56 GMT" https://pywbtest.ws-dl.cs.odu.edu/example/mp_/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Location: https://pywbtest.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:37:04 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:37:04 GMT
Vary: accept-datetime
X-Archive-Orig-Age: 351038
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:37:04 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7F13)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 https://pywbtest.ws-dl.cs.odu.edu/example/mp_/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1084
Content-Type: text/html
Date: Mon, 23 Mar 2020 13:39:29 GMT
These look good, except the following issues (which were discussed earlier as well):
  • The value of Date and Memento-Datetime headers is same
  • Non-standard collection attribute is present in the Link header
  • The Link header only reports only one memento relation (i.e., the current memento) and not the navigational memento relations such as first, prev, next, and last, which has been the norm and tools were built with these expectations, for example, MemGator relies on these relations to provide the consolidated navigational mementos in TimeGate and other related endpoints, if PyWB instances deployed in various public archives choose to omit these, MemGator's response will be less accurate unless it performs a more costly TimeMap request to establish the truth

2. No Frame Replay Mode

In this mode we disable framed replay to embed archival banner in mementos directly.
cat config-nofr.yaml
framed_replay: false

2.1. Memento

First, a request to an existing memento with an exactly matching datetime:
curl -iL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Location: https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Transfer-Encoding: chunked

<!doctype html>
<html>
<head><!-- WB Insert -->
<script>
  wbinfo = {};
  wbinfo.top_url = "https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/";
  wbinfo.url = "https://example.com/";
  wbinfo.timestamp = "20200323134145";
  wbinfo.request_ts = "20200323134145";
  wbinfo.prefix = decodeURI("https://pywbtest-nofr.ws-dl.cs.odu.edu/example/");
  wbinfo.mod = "";
  wbinfo.is_framed = false;
  wbinfo.is_live = false;
  wbinfo.coll = "example";
  wbinfo.proxy_magic = "";
  wbinfo.static_prefix = "https://pywbtest-nofr.ws-dl.cs.odu.edu/static/";
  wbinfo.enable_auto_fetch = false;
</script>
<script src='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/wombat.js'> </script>
<script>
  wbinfo.wombat_ts = "20200323134145";
  wbinfo.wombat_sec = "1584970905";
  wbinfo.wombat_scheme = "https";
  wbinfo.wombat_host = "example.com";

  wbinfo.wombat_opts = {};

  if (window && window._WBWombatInit) {
    window._WBWombatInit(wbinfo);
  }
</script>


<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest-nofr.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest-nofr.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/default_banner.css'/>




<!-- End WB Insert -->

    <title>Example Domain</title>

    <meta charset="utf-8"/>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1"/>
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
Next, a request to a memento with a nearby datetime:
curl -IL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134144/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Location: https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Finally, a request to a non-existing memento:
curl -IL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134144/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1104
Content-Type: text/html
Date: Tue, 24 Mar 2020 01:09:17 GMT
This behavior is similar to the main page memento of the default configuration (section 1.3) and inherits the same issues as discussed earlier.

2.2. TimeMap

curl -iL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/
HTTP/1.1 200 OK
Content-Length: 1117
Content-Type: application/link-format
Date: Tue, 24 Mar 2020 01:19:51 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format"
Vary: accept-datetime

<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="self"; type="application/link-format"; from="Mon, 23 Mar 2020 13:37:04 GMT",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate",
<https://example.com/>; rel="original",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323133704/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"; collection="example",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323133917/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:39:17 GMT"; collection="example",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134509/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:45:09 GMT"; collection="example",
<https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134606/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"; collection="example"
This behavior is similar to the TimeMap of the default configuration (section 1.2) and inherits the same issues as discussed earlier.

2.3. TimeGate

curl -iL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Location: https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134606/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:46:06 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323134606/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:46:06 GMT
Vary: accept-datetime
X-Archive-Orig-Age: 590951
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:46:06 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7FA7)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Transfer-Encoding: chunked

<!doctype html>
<html>
<head><!-- WB Insert -->
<script>
  wbinfo = {};
  wbinfo.top_url = "https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/";
  wbinfo.url = "https://example.com/";
  wbinfo.timestamp = "20200323134606";
  wbinfo.request_ts = "";
  wbinfo.prefix = decodeURI("https://pywbtest-nofr.ws-dl.cs.odu.edu/example/");
  wbinfo.mod = "";
  wbinfo.is_framed = false;
  wbinfo.is_live = false;
  wbinfo.coll = "example";
  wbinfo.proxy_magic = "";
  wbinfo.static_prefix = "https://pywbtest-nofr.ws-dl.cs.odu.edu/static/";
  wbinfo.enable_auto_fetch = false;
</script>
<script src='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/wombat.js'> </script>
<script>
  wbinfo.wombat_ts = "20200323134606";
  wbinfo.wombat_sec = "1584971166";
  wbinfo.wombat_scheme = "https";
  wbinfo.wombat_host = "example.com";

  wbinfo.wombat_opts = {};

  if (window && window._WBWombatInit) {
    window._WBWombatInit(wbinfo);
  }
</script>


<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest-nofr.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest-nofr.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest-nofr.ws-dl.cs.odu.edu/static/default_banner.css'/>




<!-- End WB Insert -->

    <title>Example Domain</title>

    <meta charset="utf-8"/>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1"/>
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
curl -IL --http1.1 -H "Accept-Datetime: Fri, 01 Jan 1999 12:34:56 GMT" https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Location: https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323133704/https://example.com/
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:37:04 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr.ws-dl.cs.odu.edu/example/20200323133704/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:37:04 GMT
Vary: accept-datetime
X-Archive-Orig-Age: 351038
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:37:04 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7F13)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 https://pywbtest-nofr.ws-dl.cs.odu.edu/example/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1104
Content-Type: text/html
Date: Tue, 24 Mar 2020 01:44:26 GMT
This behavior is similar to the TimeGate endpoint of main page mementos with the default configuration (section 1.4) and inherits the same issues as discussed earlier.

3. TimeGate Redirect Mode

In this mode we enable redirection behavior of the TimeGate.
cat config-tgrd.yaml
redirect_to_exact: true

3.1. Banner Memento

Banner memento behaves the same way as in the default configuration (section 1.1).

3.2. Main Page Memento

Main page memento returns an intermediary 307 response if the datetime value in the URI-M does not match exactly with an existing memento as shown below:
curl -IL --http1.1 https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134144mp_/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 03:31:19 GMT
Link: <https://example.com/>; rel="original"
Location: https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134145mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
The value of Date and Memento-Datetime headers is same in the terminal response.

3.3. TimeMap

TimeMap behaves the same way as in the default configuration, except that it lists mementos without the mp_ suffix.

3.4. TimeGate

Corresponding PyWB documentation suggests that this behavior is consistent with Memento Pattern 2.3. However, the description suggests that it actually meant Memento Pattern 2.1
PyWB documentation states:
As this approach always includes a redirect, use of this system is discouraged when the intent is to render mementos. However, this approach is useful when the goal is to determine the URI-M and to provide backwards compatibility.
I think this mutual exclusion is problematic because it gives the choice of one configuration or the other to the archive admins while it concerns clients more and admins have no way to enable both. Ideally, there should be two endpoints simultaneously available to cater both the scenarios without the need of an unnecessary configuration option.
Let's make some requests to the advertised TimeGate endpoint that belongs to the frame memento:
curl -iL --http1.1 https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 01:59:55 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"
Location: https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606/https://example.com/
Vary: accept-datetime

HTTP/1.1 200 OK
Content-Length: 1603
Content-Type: text/html
Date: Tue, 24 Mar 2020 01:59:55 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"
Memento-Datetime: Mon, 23 Mar 2020 13:46:06 GMT

<!DOCTYPE html>
<html>
<head>
<style>
html, body
{
  height: 100%;
  margin: 0px;
  padding: 0px;
  border: 0px;
  overflow: hidden;
}

</style>
<script src='https://pywbtest-tgrd.ws-dl.cs.odu.edu/static/wb_frame.js'> </script>

<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest-tgrd.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest-tgrd.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest-tgrd.ws-dl.cs.odu.edu/static/default_banner.css'/>




</head>
<body style="margin: 0px; padding: 0px;">

<div id="wb_iframe_div">
<iframe id="replay_iframe" frameborder="0" seamless="seamless" scrolling="yes" class="wb_iframe" allow="autoplay; fullscreen"></iframe>
</div>
<script>
  var cframe = new ContentFrame({"url": "https://example.com/" + window.location.hash,
                                 "prefix": "https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/",
                                 "request_ts": "20200323134606",
                                 "iframe": "#replay_iframe"});

</script>
</body>
</html>
curl -IL --http1.1 -H "Accept-Datetime: Fri, 01 Jan 1999 12:34:56 GMT" https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 02:03:58 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"
Location: https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704/https://example.com/
Vary: accept-datetime

HTTP/1.1 200 OK
Content-Length: 1603
Content-Type: text/html
Date: Tue, 24 Mar 2020 02:03:58 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"
Memento-Datetime: Mon, 23 Mar 2020 13:37:04 GMT
curl -IL --http1.1 https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1104
Content-Type: text/html
Date: Tue, 24 Mar 2020 02:05:07 GMT
Now, some requests to the non-advertised TimeGate endpoint that belongs to the main page memento:
curl -IL --http1.1 https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/mp_/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 02:07:42 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"
Location: https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/
Vary: accept-datetime

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:46:06 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323134606mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:46:06 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:46:06 GMT
X-Archive-Orig-Age: 590951
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:46:06 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7FA7)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 -H "Accept-Datetime: Fri, 01 Jan 1999 12:34:56 GMT" https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/mp_/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 02:15:29 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"
Location: https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/
Vary: accept-datetime

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:37:04 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/20200323133704mp_/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:37:04 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:37:04 GMT
X-Archive-Orig-Age: 351038
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:37:04 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7F13)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 https://pywbtest-tgrd.ws-dl.cs.odu.edu/example/mp_/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1104
Content-Type: text/html
Date: Tue, 24 Mar 2020 02:16:29 GMT
On a positive note, unlike the default configuration, non-existing resources are detected even in the banner memento and return a 404 status code. Other than this there are numerous Memento-compliance issues in these:
  • The value of Date and Memento-Datetime headers is same in terminal responses after redirects
  • The memento relation in the Link header always returns a main page memento (i.e., one with mp_ suffix) even when the banner memento is requested and the payload represents the banner page
  • Both PyWB documentation as well as Memento RFC talk about a 302 status code when content negotiation is performed in this style, but the implementation returns 307 instead, which is against the specification and breaks tools like MemGator and Memento Validator
  • If the 307 redirect were to be considered as a temporary resource then it MUST NOT include a Vary header with accept-datetime value in it and there SHOULD be a usual 302 response somewhere in the redirection chain
  • If the purpose of replacing 302 with 307 is to support methods like POST and OPTIONS then the matter must be discussed with the community to resolve it collaboratively in a transparent manner because the Memento RFC does not support
  • Presence of an arbitrary collection attribute and absence of navigational memento relations as discussed earlier

4. No Frame Replay With TimeGate Redirect Mode

In this mode we disable banner container frame and enable redirection behavior of the TimeGate.
cat config-nofr-tgrd.yaml
framed_replay: false
redirect_to_exact: true

4.1. Memento

curl -iL --http1.1 https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
Transfer-Encoding: chunked

<!doctype html>
<html>
<head><!-- WB Insert -->
<script>
  wbinfo = {};
  wbinfo.top_url = "https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/";
  wbinfo.url = "https://example.com/";
  wbinfo.timestamp = "20200323134145";
  wbinfo.request_ts = "20200323134145";
  wbinfo.prefix = decodeURI("https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/");
  wbinfo.mod = "";
  wbinfo.is_framed = false;
  wbinfo.is_live = false;
  wbinfo.coll = "example";
  wbinfo.proxy_magic = "";
  wbinfo.static_prefix = "https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/static/";
  wbinfo.enable_auto_fetch = false;
</script>
<script src='https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/static/wombat.js'> </script>
<script>
  wbinfo.wombat_ts = "20200323134145";
  wbinfo.wombat_sec = "1584970905";
  wbinfo.wombat_scheme = "https";
  wbinfo.wombat_host = "example.com";

  wbinfo.wombat_opts = {};

  if (window && window._WBWombatInit) {
    window._WBWombatInit(wbinfo);
  }
</script>


<script>
window.banner_info = {
    is_gmt: true,

    liveMsg: decodeURIComponent("Live on"),

    calendarAlt: decodeURIComponent("Calendar icon"),
    calendarLabel: decodeURIComponent("View All Captures"),
    choiceLabel: decodeURIComponent("Language:"),
    loadingLabel: decodeURIComponent("Loading..."),
    logoAlt: decodeURIComponent("Logo"),

    locale: "en",
    curr_locale: "",
    locales: [],
    locale_prefixes: {},
    prefix: "https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/",
    staticPrefix: "https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/static"
};
</script>

<!-- default banner, create through js -->
<script src='https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/static/default_banner.js'> </script>
<link rel='stylesheet' href='https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/static/default_banner.css'/>




<!-- End WB Insert -->

    <title>Example Domain</title>

    <meta charset="utf-8"/>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1"/>
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
curl -IL --http1.1 https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134144/https://example.com/
HTTP/1.1 307 Temporary Redirect
Content-Length: 0
Date: Tue, 24 Mar 2020 03:25:18 GMT
Link: <https://example.com/>; rel="original"
Location: https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 Mar 2020 13:41:45 GMT
Link: <https://example.com/>; rel="original", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/https://example.com/>; rel="timegate", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/timemap/link/https://example.com/>; rel="timemap"; type="application/link-format", <https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134145/https://example.com/>; rel="memento"; datetime="Mon, 23 Mar 2020 13:41:45 GMT"; collection="example"
Memento-Datetime: Mon, 23 Mar 2020 13:41:45 GMT
X-Archive-Orig-Age: 510746
X-Archive-Orig-Cache-Control: max-age=604800
X-Archive-Orig-Content-Encoding: gzip
X-Archive-Orig-Content-Length: 648
X-Archive-Orig-Etag: "3147526947"
X-Archive-Orig-Expires: Mon, 30 Mar 2020 13:41:45 GMT
X-Archive-Orig-Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
X-Archive-Orig-Server: ECS (dcb/7EA4)
X-Archive-Orig-Vary: Accept-Encoding
X-Cache: HIT
curl -IL --http1.1 https://pywbtest-nofr-tgrd.ws-dl.cs.odu.edu/example/20200323134144/https://missing.example.com/
HTTP/1.1 404 Not Found
Content-Length: 1124
Content-Type: text/html
Date: Tue, 24 Mar 2020 03:27:46 GMT
The behavior here is similar to the main page memento as described above in the TimeGate redirect mode (section 3.2).

4.2. TimeMap

TimeMap behaves the same way as in the default configuration (section 1.2), except that it lists mementos without the mp_ suffix.

4.3. TimeGate

TimeGate endpoint behaves the same way as described above in the TimeGate redirect (section 3.4) mode and inherits the same issues as discussed earlier. The only difference is in the payload of the final response that is same as the no frame replay mode described earlier (section 2.3).

Summary

I audited the latest development version of PyWB (as of March 23, 2020) with a number of different configurations for its Memento compliance and found numerous issues of varying severity levels that may break various tools of the Web archiving and Memento ecosystem.
Critical violations (MUST be fixed):
  1. TimeGate in redirect mode MUST use 302-style content negotiation and not 307, which is not part of the Memento RFC, should 307-style be mandatory, the matter must be discussed with the community to resolve collaboratively in a transparent manner (see section 3.4) -- [Reported in ipwb#545]
  2. As per RFC 5988 arbitrary attributes are not allowed in Link, hence collection attribute in Link header and TimeMap entity MUST be removed or incorporated as per RFC 6573 (see sections 1.2 and 1.3) -- [Reported in ipwb#546]
  3. When accessing a main page memento, entries in the Link header MUST correspond to the main page memento, and not the corresponding banner memento (see section 1.4) -- [Reported in ipwb#547]
  4. In main page mementos the value of the Memento-Datetime header overwrites the Date header, these headers have distinct semantics, their values MUST NOT be the same, except in rare cases when a memento is replayed within one second of its capture (see section 1.3) -- [Reported in ipwb#548]
  5. Banner mementos blindly echo back requested datetime in Memento-Datetime header with 200 status code irrespective of the existence of an exactly matching memento or no mementos at all (see section 1.4) -- [Reported in ipwb#549]
  6. During datetime-based content negotiation a temporary resource MUST NOT include a Vary header with accept-datetime value (see section 3.4) -- [Reported in ipwb#550]
Moderate issues (SHOULD be revisited):
  1. TimeMaps SHOULD NOT support content negotiation based on Accept-Datetime header (see section 1.2) -- [Reported in ipwb#551]
  2. If there are variations of mementos (e.g., banner, rewritten, raw), the community SHOULD discuss how to report them in Link header and TimeMaps and which ones should be reported in certain responses (see section 1.4) -- [Reported in ipwb#552]
  3. In case of implicit datetime content negotiation (i.e., using the datetime string of the URI-M path and not the Accept-Datetime header) a 302 redirect should be returned to the closest memento instead of returning 200 and relying on Content-Location header to not pollute the URI-M space and to ensure caches and many tools that rely on this behavior function properly (see section 1.3) -- [Reported in ipwb#553]
  4. Expose both styles of content negotiation (i.e., 200 and 302) simultaneously, so that user-agents get to decide which one to consume, not the web master (see section 3.4) -- [Reported in ipwb#554]
  5. Navigational memento link relations (i.e., first, prev, next, and last) are recommended to be included in Link header of TimeGate and memento responses as many tools rely on them (see sections 1.3 and 1.4) -- [Reported in ipwb#555]
  6. Fix PyWB documentation to align with the implementation (see section 3.4) -- [Reported in ipwb#556]

Acknowledgements

This audit report has greatly benefited from the feedback of Herbert Van de Sompel, Martin Klein, and Michael L. Nelson. I am grateful for their contributions, but I am responsible for any errors that may be present.
--
Sawood Alam

Comments

Post a Comment