Wednesday, April 17, 2019

2019-04-17: Russell Westbrook, Shane Keisel, Fake Twitter Accounts, and Web Archives


On March 11, 2019 in the NBA, the Utah Jazz hosted their Northwest Division rivals, the Oklahoma City Thunder.  During the game, a Utah fan (Shane Keisel) and a Oklahoma City player (Russell Westbrook) engaged in a verbal exchange, with the player stating the fan was directing racist comments to him and the fan admitting to heckling but denying that his comments were racist.  The event was well documented (see, for example, this Bleacher Report article), and the following day the fan received a lifetime ban from all events at the Vivint Smart Home Arena and the player received a $25k fine from the NBA.

Disclaimer: I have no knowledge of what the fan said during the game, nor do I have an opinion regarding the appropriateness of the respective penalties.  My interest is that after the game, the fan gave at least one interview with a TV station reporter in which he exposed his identity.  That set off a rapidly evolving series of events with both real and fake Twitter accounts, which we unravel with the aid of multiple web archives.  The initial analysis was performed by Justin Whitlock as a project in my CS 895 "Web Archiving Forensics" class; prior to Justin proposing it as a project topic, my only knowledge of this event was via the Daily Show.


First, let's establish a timeline of events.  The timeline is made a little bit complicated because of although the game was played in the Mountain time zone, most media reports are relative to Eastern time, and the web crawlers report their time in UTC (or GMT).  Furthermore, daylight savings time began on Sunday, March 10, and the game was played on Monday, March 11.  This means there is a four hour differential between UTC and EDT, and a six hour differential between UTC and MDT.  Although most events occur after daylight savings, some events will occur before (where there would be a five hour differential between UTC and EST). 
  • 2019-03-12T01:00:00Z -- the game is scheduled to begin at March 11, 9pm EDT (March 12, 1am UTC).  An NBA game will typically last 2--2.5 hours, and at least one tweet shows Westbrook talking to someone in the bleachers midway through second quarter (there may be other videos in circulation as well).
  • 2019-03-12T03:58:00Z -- based on the empty seats and the timestamp on the tweet (11:58pm EDT), the post-game interview with a KSL reporter embedded above reveals the fan's name and face.  The uncommon surname of "Keisel" combined with a closeup of his face enables people to find quickly find his Twitter account: "@skeisel391". 
  • 2019-03-12T04:57:34Z -- Within an hour of the KSL interview being posted, Keisel's Twitter account is "protected". This means we can see his banner and avatar photos and his account metadata, but not his tweets.
  • 2019-03-12T12:23:42Z -- Less than 9 hours after the KSL interview, his Twitter account is "deleted". No information is available from his account at this time.
  • 2019-03-12T15:29:47Z -- Although his Twitter account is deleted, the first page (i.e., first 20 tweets) is still in Google's cache and someone has pushed Google's cached version of the page into a web archive.  The banner of the web archive (archive.is) obscures the banner inserted by Google's cache, but a search of the source code of http://archive.is/K6gP4 reveals: 
    "It is a snapshot of the page as it appeared on Mar 6, 2019 11:29:08 GMT." 
In other words, an archived version of Google's cached page reveals Keisel's tweets (the most recent 20 tweets anyway) from nearly a week before (i.e., 2019-03-06T11:29:08Z) the game on March 11, 2019.

Although Keisel quickly protected and then ultimately deleted his account, until it was deleted his photos and account metadata were available and allowed a number of fake accounts to proliferate.  The most successful fake is "@skeiseI391", which is also now deleted but stayed online until at least 2019-03-17T04:18:48Z.  "@skeiseI391" replaces the lowercase L ("l") with an uppercase I ("I").  Depending on the font of your browser, the two characters can be all but indistinguishable (here they are side-by-side: lI).  I'm not sure who created this account, but we discovered it in this tweet, where the user provides not only screen shots but also a video of scrolling and clicking through the @skeiseI391 account before it was deleted.






The video has significant engagement: originally posted at 2019-03-12T10:55:00Z, it now has greater than 1k RTs, 3k likes, and 381k views.  There are many other accounts circulating these screen shots: some of which are provably true, some of which are provably false, and some of which cannot be verified using public web archives.  The screen shots have had an impact in the news as well, showing up in among others: The Root, News One, and BET.   BET even quoted a provably fake tweet in the headline of their article:

This article's headline references a fake tweet.
The Internet Archive has mementos (archived web pages) for both the fake @skeiseI391 and the real @skeisel391 accounts, but the Twitter account metadata (e.g., when the account was created, how many followers, how many tweets) for the fake acount are in Chinese and in Kannada for real account.  This is admittedly confusing, but is a result of how the Internet Archive's crawler and Twitter's cookies interact; see our research group's posts from 2018-03 and 2019-03 on these topics for further information.  Fortunately, archive.is does not have the same problems with cookies, so we use their mementos for the following screen shots (two from the real account at archive.is and one from the fake account at archive.is).

real account, 2019-03-06T11:29:08Z (Google cache)
real account, 2019-03-12T04:57:34Z
From the account metadata, we can see this was not an especially active account: established in October 2011, it has 202 total tweets, 832 likes, following 51 accounts, and from March 6 to March 12, it went from 41 to 53 followers.  The geographic location is set to "Utah, USA", and the bio has no linked URL and has three flag emojis.

fake account; note the difference in the account metadata
The fake account has notably different metadata: the bio has only two flag emojis, plus a link to "h.cm", a page for a parked domain that appears to have never had actual content (the Internet Archive has mementos back to 2012). Furthermore, this account is far more active with 7k tweets, 23k likes, 1500 followers and following 1300 accounts, all since being created in August 2018.

Twitter allows users to change their username (or "handle") without losing followers, old tweets, etc.  Since the handle is reflected in the URL and web archives only index by URL, we cannot know what the original handle of the fake @skeiseI391 account, but at some point after the game the owner changed from the original handle to "skeiseI391".  Since the account is no longer live, we cannot use the Twitter API to extract more information about the account (e.g., followers and following, tweets prior to the game), but given the link to a parked/spam web page and the high level of engagement in  a short amount of time, this was likely a burner bot account designed amplify legitimate accounts (cf. "The Follower Factory"), and then was adapted for this purpose.

We can pinpoint when the fake @skeiseI391 account was changed.  By examining the HTML source from the IA mementos of the fake and real accounts, we can determine the URLs of the profile images:

Real: https://pbs.twimg.com/profile_images/872289541541044225/X6vI_-xq_400x400.jpg

Fake: https://pbs.twimg.com/profile_images/1105325330347249665/YHcWGvYD_400x400.jpg

Both images are 404 now, but they are archived at those URLs in the Internet Archive:

Archived real image, uploaded 2017-06-07T03:08:07Z
Archived fake image, uploaded 2019-03-12T04:29:09Z
Also note that the tool used to download the real image and then upload as the fake image maintained the circular profile pic instead of the original square.

For those familiar with curl, I include just a portion of the command line interface that shows the original "Last-Modified" HTTP response header from twitter.com.  It is those dates that record when the image changed at Twitter; these are separate from the dates from when the image was archived at the Internet Archive.  The relevant response headers are shown below:

Real image:
$ curl -I http://web.archive.org/web/20190312045057/https://pbs.twimg.com/profile_images/872289541541044225/X6vI_-xq_400x400.jpg
HTTP/1.1 200 OK
Server: nginx/1.15.8
Date: Wed, 17 Apr 2019 15:12:02 GMT
Content-Type: image/jpeg
...

X-Archive-Orig-last-modified: Wed, 07 Jun 2017 03:08:07 GMT
...

Memento-Datetime: Tue, 12 Mar 2019 04:50:57 GMT
...


Fake image:
$  curl -I http://web.archive.org/web/20190312061306/https://pbs.twimg.com/profile_images/1105325330347249665/YHcWGvYD_400x400.jpg
HTTP/1.1 200 OK
Server: nginx/1.15.8
Date: Wed, 17 Apr 2019 15:13:21 GMT
Content-Type: image/jpeg
...

X-Archive-Orig-last-modified: Tue, 12 Mar 2019 04:29:09 GMT
...

Memento-Datetime: Tue, 12 Mar 2019 06:13:06 GMT
...


The "Memento-Datetime" response header is when the Internet Archived crawled/created the memento (real = 2019-03-12T04:50:57Z; fake = 2019-03-12T06:13:06Z), and the "X-Archive-Orig-last-modified" response header is the Internet Archive echoing the "Last-Modified" response header it received from twitter.com at crawl time.  From this we can establish that the image was uploaded to the fake account at 2019-03-12T04:29:09Z, not quite 30 minutes before we can establish that the real account was set to "protected" (2019-03-12T04:57:34Z). 

We've presented a preponderance of evidence of that the account the account @skeiseI391 is fake and that fake account is responsible for the "come at me _____ boy" tweet referenced in multiple news outlets.  But what about some of the other screen shots referenced in social media and the news?  Are they real?  Are they photoshopped?  Are they from other, yet-to-be-uncovered fake accounts?

First, any tweet that is a reply to another tweet will be difficult to verify with web archives unless we know the direct URL of the original tweet or the reply itself (e.g., twitter.com/[handle]/status/[many numbers]).  Unfortunately, the deep links for individual tweets are rarely crawled and archived for less popular accounts.  While the top level page will be crawled and the most recent 20 tweets included, one has to be logged in to Twitter to see the tweets included in the "Tweets & replies" tab, and public web archives are not logged in when they crawl so those contents are typically not available.  As such, it is hard to establish via web archives if the screen shot of the reply below is real of fake.  The original thread is still on the live web, but of the 45 replies, two of them are marked "This Tweet is unavailable".  One of those could be a reply from the real @skeisel391, but we don't have enough information to definitively rule if that is true.  The particular tweet shown below ("#poorloser") is of issue because even though it was from nearly a year ago, it would contradict the "we were having fun" attitude from the KSL interview.  Other screen shots that appear as replies will be similarly difficult to uncover using web archives.

This could be a real reply, but with web archives it is difficult to establish provenance of reply tweets.
The tweet below is more difficult to establish, since it does not appear to be a reply and the datetime that it was posted (2018-10-06T16:11:00Z) falls with the date range of the memento of the page in the Google cache, which has tweets from 2019-02-27 to 2018-10-06.  The use of "#MAGA" is inline with what we know Keisel has tweeted (at least 7 of the 20 tweets are clearly conservative / right-wing).  At first glance it appears that memento covers tweets all the way back to 2018-10-04, since a retweet with that timestamp appears as the 20th and final tweet on the page, and thus a tweet from 2018-10-06 should appear before the one with a timestamp of 2018-10-04.  But retweeting a page does not reset the timestamp; for example if I tweeted something yesterday and you retweet it today, your retweet will show my timestamp of yesterday.  So although the last timestamp shown on the page is 2018-10-04, the 19th tweet on the page is from Keisel and shows a timestamp of 2018-10-06.  So it's possible that the retweet occurred on 2018-10-06 and the tweet below just missed being included in the 20 most recent tweets (i.e., the 21st most recent tweet).  The screen shot shows a time of "11:11am", and in the HTML source of Google's cached page, for the 19th tweet it has:

title="8:11 AM - 6 Oct 2018"

Which would suggest that the screen shot happened after the 19th tweet, but without time zone information we can't reliably sequence the tweets.  Depending on the GeoIP of Google's crawler, Twitter would set the "8:11 AM" value relative to that timezone.  It's tempting to think it's in California and thus PST, but we can't be certain.  Regardless, there's no way to know the default time zone of the presumed client in the screen shot.

We cannot definitely establish the provenance of this tweet.
Bing's cache also has a copy of Keisel's page, and it covers a period of 2018-09-14 to 2018-03-27.  Unfortunately, that leaves a coverage gap from 2018-10-06 to 2018-09-14, inclusive, and if the "#MAGA" tweet is real it could fall between the coverage provided by Google's cache and Bing's cache.

This leaves three scenarios to account for the above "#MAGA" tweet and why we don't have a cached copy of it:
  1. Keisel deleted this tweet on or before March 6, 2019 in anticipation of the game on March 11, 2019.  While not impossible, it does not seem probable because it would require someone taking a screen shot of the tweet prior to the KSL interview.  Since the real @skeisel391 account was not popular (~200 tweets, < 50 followers), this seems like an unlikely scenario.
  2. Someone photoshopped or otherwise created a fake tweet.  Given the existence of the fake @skeiseI391 account (and other fake accounts), this cannot be ruled out.  If it is a fake, it does not appear to have the same origin as the fake @skeiseI391 account.  
  3. The screen shot is legitimate and we are simply unlucky that the tweet in question fell in the coverage gap between the Google cache and the Bing cache, just missing appearing on the page in Google's cache.
I should note that in the process of extending Justin's analysis we came across this thread from sports journalist @JonMHamm, where he uncovered the fake @skeiseI391 account and also looked at the page in Google's cache, although he was unaware that the earliest date it establishes is 2018-10-06 and not 2018-10-04.  He also vouches for a contact that claims to have seen the "#MAGA" tweet while it was still live, but that's not something I can independently verify.




In summary, of the three primary tweets offered as evidence, we can reach the following conclusions:
  1. "come at me _____ boy" -- this tweet is definitively fake.
  2. "#poorloser" -- this tweet is a reply, and in general reply tweets will not appear in public web archives, so web archives cannot help us evaluate this tweet.
  3. "#MAGA" -- this tweet is either faked, or it falls in the gap between what appears in the Google cache and what appears in the Bing cache; using web archives we cannot definitively determine explanation is more likely.
We welcome any feedback, additional cache sources, deep links to individual tweets, evidence that these tweets were ever embedded in HTML pages, or any additional forensic evidence.  I thank Justin Whitlock for the initial analysis, but I take responsibility for any errors (including the persistent fear of incorrectly computing time zone offsets).

Finally, in the future please don't just take a screen shot, push it to multiple web archives

--Michael




Note: There are other fake Twitter accounts, for example: @skeisell391 (two lowercase L's),  @skeisel_ (trailing underscore), but they are not well-executed and I have omitted them from the discussion above.  

No comments:

Post a Comment