2022-03-30: GitHub is not an archive - GitHub Pages
Most GitHub users are aware of *.github.io as a GitHub hosted website. But, before there was *.github.io, (https://elescamilla.github.io) there was *.github.com (https://elescamilla.github.com) that had the exact same functionality. What caused the change?
On April 5, 2013, GitHub released a statement that they would be deprecating *.github.com for security reasons. In the post, they said that "all traffic will be redirected to the new *.github.io location indefinitely, so you won't have to change any links". However, on January 29, 2021, they released an updated statement that they would stop redirecting *github.com to *.github.io starting April 15, 2021 to further address security concerns. They recommended that users "remove any external references to *.github.com". To encourage users to update any external links, they scheduled two "brown out" dates and notified users of the upcoming change.
But...
In most situations, it is difficult to modify URLs in a publication where the content is permanent. For example, in the arXiv corpus that I am studying as part of the CoSAI project, from 2011 to 2021 there are 335 PDF publications that reference "aplpy.github.com". But, this page no longer exists and users are no longer automically redirected to the "aplpy.github.io" page that has replaced it. An example of a publication referencing "aplpy.github.com" is shown below.
Captured from https://arxiv.org/abs/2011.08829, page 18
This is just one example of a broken GitHub Pages link. This publication now permanently contains a broken link even though the Web page was available when the publication was submitted. For more on the topic of URL integrity in the academic corpus, see Klein et al., 2014 and Jones et al., 2016.
https://aplpy.github.com has a 404 HTTP response
$ curl -is https://aplpy.github.com | head -10 HTTP/1.1 404 Not Found Connection: keep-alive Content-Length: 9581 Server: GitHub.com Content-Type: text/html; charset=utf-8 x-pages-interstitial: 1 Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; img-src data:; connect-src 'self' X-GitHub-Request-Id: 9662:239D:F8B449:177A80F:62437811 Accept-Ranges: bytes Date: Tue, 29 Mar 2022 21:20:17 GMT
and displays the following:
$ curl -is https://aplpy.github.io | head -15 HTTP/1.1 200 OK Connection: keep-alive Content-Length: 6912 Server: GitHub.com Content-Type: text/html; charset=utf-8 permissions-policy: interest-cohort=() Last-Modified: Sat, 16 Jun 2018 20:43:22 GMT Access-Control-Allow-Origin: * ETag: "5b25766a-1b00" expires: Tue, 29 Mar 2022 15:14:26 GMT Cache-Control: max-age=600 x-proxy-cache: MISS X-GitHub-Request-Id: 51BE:3C52:5658F8:C5C838:62431FFA Accept-Ranges: bytes Date: Tue, 29 Mar 2022 21:22:24 GMT
$ curl -is https://web.archive.org/web/20220223200804/http://aplpy.github.com/ | head -25 HTTP/1.1 404 Not Found Server: nginx/1.19.5 Date: Tue, 29 Mar 2022 21:28:15 GMT Content-Type: text/html; charset=utf-8 Content-Length: 19335 Connection: keep-alive x-archive-orig-server: GitHub.com x-archive-orig-x-pages-interstitial: 1 x-archive-orig-content-security-policy: default-src 'none'; style-src 'unsafe-inline'; img-src data:; connect-src 'self' x-archive-orig-x-github-request-id: 79C2:39BE:AB8B4:150E77:62169424 x-archive-orig-content-length: 9581 x-archive-orig-accept-ranges: bytes x-archive-orig-date: Wed, 23 Feb 2022 20:08:04 GMT x-archive-orig-via: 1.1 varnish x-archive-orig-age: 0 x-archive-orig-connection: keep-alive x-archive-orig-x-served-by: cache-sjc10046-SJC x-archive-orig-x-cache: MISS x-archive-orig-x-cache-hits: 0 x-archive-orig-x-timer: S1645646885.714892,VS0,VE68 x-archive-orig-vary: Accept-Encoding x-archive-orig-x-fastly-request-id: bf0b2be8a3fdc757390c0898be26536e09655876 x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Wed, 23 Feb 2022 20:08:04 GMT
$ curl -Is -H "Accept-datetime: Sat, 27 Mar 2021 03:27:32 GMT" https://web.archive.org/web/20210327032732/https://aplpy.github.com/ | head -25 HTTP/1.1 301 Moved Permanently Server: nginx/1.19.5 Date: Tue, 29 Mar 2022 21:32:42 GMT Content-Type: text/html Content-Length: 162 Connection: keep-alive x-archive-orig-connection: keep-alive x-archive-orig-content-length: 162 x-archive-orig-server: GitHub.com location: https://web.archive.org/web/20210327032732/http://aplpy.github.io/ x-archive-orig-x-github-request-id: 5F6C:49AA:10BCA9:1DB3E7:605EA623 x-archive-orig-accept-ranges: bytes x-archive-orig-date: Sat, 27 Mar 2021 03:27:32 GMT x-archive-orig-via: 1.1 varnish x-archive-orig-age: 0 x-archive-orig-x-served-by: cache-sjc10072-SJC x-archive-orig-x-cache: MISS x-archive-orig-x-cache-hits: 0 x-archive-orig-x-timer: S1616815652.059438,VS0,VE22 x-archive-orig-vary: Accept-Encoding x-archive-orig-x-fastly-request-id: 212700e23721274eb9d9b595922151667aee92c7 cache-control: max-age=1800 memento-datetime: Sat, 27 Mar 2021 03:27:32 GMT
Comments
Post a Comment