2022-11-02: David DePape's blog on "frenleyfrens.com" was crawled by Bing by at least October 24, 2022 -- four days before the attack on Paul Pelosi

 

Screen shot from: https://www.thegatewaypundit.com/2022/10/exclusive-two-far-right-websites-attributed-david-depape-fabricated-created-friday-deleted-today/

An October 29, 2022 article in the Gateway Pundit about the October 28, 2022 attack on Paul Pelosi stated:

It looks like this is all another far-left conjured-up lie.  The websites reportedly connected to Paul Pelosi’s “friend” in his underwear are likely fakes.

Part of the author's reasoning is "[t]he only activity on this site as well was in the last two days", referring to the screen shot above of the TimeMap of "frenlyfrens.com/blog" at the Internet Archive's Wayback Machine.   The author is conflating crawling of the web page by the Wayback Machine with the creation of the web page, and then arriving at the conclusion that the web page was created on October 28, 2022, the day of the attack.  This is most clearly stated in the slug used in the article's URL (emphasis mine):

https://www.thegatewaypundit.com/2022/10/exclusive-two-far-right-websites-attributed-david-depape-fabricated-created-friday-deleted-today/ 

In the URL above, "friday" refers to October 28, 2022 and "today" refers to October 29, 2022. Of course, there's no guarantee that web sites will be crawled by the Internet Archive (or Google, Bing, etc.) the day they are created.  While the presence of a URL in the index of the Internet Archive is proof that it existed (or at least returned an HTTP response) at the time it was crawled, creation time can and frequently does predate the time of archiving.

Using web archives, WHOIS, and search engine (SE) caches, we can prove prove:

  1. The domain "frenlyfrens.com" was registered September 8, 2022.
  2. Web pages in the blog were crawled by Bing as early as October 24, 2022.

Thereby negating the premise of the Gateway Pundit article that the web sites were fabricated after the attack, as a "another far-left farce" to associate the attacker with QAnon and other far-right conspiracy theories.  The web archives will not help us establish that the David DePape that registered "frenlyfrens.com" is the same David DePape that attacked Paul Pelosi, nor can they help us address the claims that do not have a web component, but they will help us establish a sequence of web-based events prior to October 28, 2022. 

First, we can see from whois.com that the domain "frenlyfrens.com" was registered on September 8, 2022 (2022-09-08).  

https://www.whois.com/whois/frenlyfrens.com 
https://archive.ph/qlsDl 

Since this registration can also change in the future, I've archived the above page to preserve the registration as it was on 2022-10-31. 

Next, we can prove that search engines were crawling the site prior to October 28, 2022.  Google presumably crawled the site, but they are also quick to purge their caches when pages are removed and I found no evidence of frenlyfrens.com in Google's index when I checked on October 31, 2022.  But there is evidence of it being indexed in both Bing and Yandex.  In the images below, I provide: the live web link to the SE, a copy of the SE cache URL in the Wayback Machine, and a copy of the SE cache URL in archive.today.  The live web SE caches will eventually disappear.  

Bing cache (live web)
Bing cache 2022-10-31 (Wayback Machine)
Bing cache 2022-10-31 (archive.today)
Note: the live web Bing cache looks slightly different than the archived versions because Bing is likely changing the SERP based on the HTTP User-Agent request header.


Yandex cache (live web)
Yandex cache 2022-10-31 (Wayback Machine)
Yandex cache 2022-10-31 (archive.today)
Note: the live web Yandex SERP seems to give a different ordering to the URLs on each access.

In both the Bing and Yandex SE result pages (SERPs), you can see dates that predate October 28, 2022, such as Bing's "Sep 28, 2022", "Sep 04, 2022", and "Aug 23, 2022".  Two of these dates are prior to the domain being registered (2022-09-08).  There are two possible explanations for why some posts have earlier dates:

  1. The site was being published at http://wixsite.com prior to having a custom domain registered.
  2. The HTML meta tags and Schema.org JSON block were altered by the content creator.

Regarding explanation #2, it is important to note that SEs retrieve the dates displayed in the SERPs by extracting them from the HTML.  For example, looking at:

https://www.frenlyfrens.com/post/pizzagate 

(Bing cache live web, Bing cache 2022-10-31 (Wayback Machine), Bing cache 2022-10-31 (archive.today))

The meta Open Graph and Schema.org sections look like:

The HTML meta data for https://www.frenlyfrens.com/post/pizzagate.

It would be possible for a skilled author of the blog to reset the value of, for example:

    <meta property="article:published_time" 
content="2022-08-23T22:50:24.744Z"/>

Which would cause the SEs to display a value of "Aug 23, 2022" next to the search result.  While this is technically possible, this would required technical skills beyond your average blogger.  In my assessment, it is far more likely that explanation #1, the blog originated at wixsite.com (Wix.com is a web hosting & blogging platform) and then later owner decided to register a custom domain for the site.  This is a common approach for bloggers; see for example this article with exemplary wix.com sites, each of which has a custom domain). 

Fortunately, we can examine the HTML source of the Bing cached pages, where Bing inserts a banner telling the user that this is cached page, the date on which it was cached, etc.  Unfortunately, the Bing banner doesn't display in the live web or archived versions, because it is competing with the Javascript from wixsite.com (actually, replaying the Wayback Machine version, you'll see the banner flash momentarily and then disappear).  Fortunately, the banner remains in the archived HTML source:

The Bing cached version of https://www.frenlyfrens.com/post/pizzagate, archived at the Wayback Machine (2022-10-31).

The image above demonstrated conclusively that SEs, in this case Bing, were crawling the site as early as October 24, 2022 -- four days before the attack.  Even if we allow for sophisticated web site fabrication with HTML meta tag values back dated before October 28, 2022, there is no plausible alternate explanation for the "10/24/2022" value added by Bing other than it was in fact crawled on Oct 24, 2022.  

Due to the transformation of HTML when archive.today archives a page, the source for the Bing banner is lost.  However, archive.today does use the cached date to establish the original date of the page.  In the picture below, you can see the value of "24 Oct 2022 00:00:00 UTC" in the upper right-hand corner -- this value is derived from the Bing cache date.  

https://archive.ph/2sROt

A similar review of:

https://www.frenlyfrens.com/post/mccarthyism 
Yields a value of "10/25/2022":

Archived version of https://www.frenlyfrens.com/post/mccarthyism at the Wayback Machine (2022-10-31)

It is possible that other pages at frenlyfrens.com could establish even earlier dates of being crawled by Bing, but these were the two that I first found and they are sufficient to disprove the claim from the Gateway Pundit article.  Furthermore, having disproved the claim with frenlyfrens.com, I did not explore the second site mentioned, Godisloving.wordpress.com, but I have every expectation that it would yield similar results.

In summary, we have proved:
  1. The domain "frenlyfrens.com" was established 2022-09-08.
  2. Some of its pages were crawled by Bing as early as 2022-10-24, four days before the attack. 
Why wasn't frenlyfrens.com crawled by the Wayback Machine prior to October 28, 2022?  First, while the Internet Archive is the preeminent web archive, it is not resourced like Google and Bing. If no one uses "Save Page Now", the Wayback Machine will not find a URL until someone creates a link to it on a page that the Wayback Machine is already crawling.  Simply put, being archived correlates with popularity, and the Wayback Machine will not necessarily find new and obscure pages that few have linked to.  The corollary is: if you see something strange or interesting on the web, push it into the Wayback Machine and archive.today (and perma.cc, if you have an account) because you can't just assume that the web archives will find them on their own.  


--Michael


Appendices

I first learned of this story from this Charlie Warzel tweet:
Which is itself a commentary on this Lara Logan tweet:
After publishing my tweet thread (covering the material in this blog):
I learned of this Michael Biesecker thread:

The examples above use the source code of the Bing cached pages but not Yandex.  Even though Yandex shows the pages indexed, when clicking on "Saved Copy" (their term for cached):

Clicking on "Saved Copy"

We see that all the pages (or at least the several that I tried) have been purged from the cache and return HTTP 404:

Clicking "Saved Copy" in the above image produces this 404 page.  I had similar results for all the Yandex SERP results I tried.

Ideally we would be able to retrieve the HTML source from the Yandex cache and have a second source of pre-October 28, 2022 SE crawl dates. 

Regarding archiving correlating with popularity, our research group has explored this, directly and indirectly, in a number of publications, including:
  1. Lulwah Alkwai, Michael L. Nelson, and Michele C. Weigle, Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web PagesACM Transactions on Information Systems, 36(1), 2017.
  2. Lulwah Alkwai, Michael L. Nelson, and Michele C. Weigle, How Well Are Arabic Websites Archived?Proceedings of JCDL 2015.
  3. Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, Who and what links to the Internet ArchiveInternational Journal on Digital Libraries, 14(3), pp. 101--115. April 2014.
  4. Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, Who and What Links to the Internet ArchiveProceedings of TPDL 2013. (Also available as Technical Report arXiv:1309.4016).
  5. Scott Ainsworth, Ahmed Alsum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson, How much of the web is archived?Proceedings of JCDL 2011, pp. 133-136. (Also available as Technical Report arXiv:1212.6177).



Comments