2022-12-12: Disinformation Spread on Social Media through Screenshot Sharing: Dataset Description

It wasn’t so long ago when Elon Musk, CEO of Tesla Motors and SpaceX, offered to buy Twitter by posting a tweet (Fig. 1). Eventually, a number of fake tweets relevant to this issue went viral on social media (Figs. 2 and 3). The BOOM fact-checking website analyzed the Musk screenshots and determined that the tweets of Figs. 2 and 3 weren’t actually posted. The screenshots had been edited to appear as Elon Musk’s tweets and then were circulated on social media, in this case intended as humorous commentary on the situation. Therefore, it’s important to verify whether a screenshot is fake or real before sharing it further.

Figure 1. Screenshot of Elon Musk’s tweet.

Figure 2. Fake tweet screenshot on Twitter.

(https://twitter.com/Kingsirluke/status/1519029687863099392)

Figure 3. Fake tweet screenshot on Facebook.

(https://www.facebook.com/justjoelscafe/posts/956811948330818)

There are a number of reasons why an overwhelming majority of users share content through screenshots on social media. Social media platforms provide intra-platform operability using likes, replies, comments, retweets, quote tweets, shares etc. But, the inter-platform operability on the widely used social media platforms - Facebook, Twitter, Instagram - is quite difficult as they maintain lock-in on their platforms. As a result, screenshots are used for platform sharing to increase cross-platform engagement. For example, Fig. 4 shows the author Robert Reich (@RBReich) himself posted a tweet and then shared the screenshot of it on his Facebook page 'Class in Session' to increase cross-platform engagement.

Figure 4. Example of cross-platform sharing of a screenshot.

Moreover, there is a possibility of controversial posts being deleted. So, users often take screenshots to keep evidence before they disappear from social media. Screenshots preserve the format of the post in such a way that it becomes difficult for the original author to edit or delete the screenshot. For example, Fig. 5 shows a screenshot of a tweet that was deleted soon after a staff error was detected as checked by the fact-checking website Politifact but the screenshot became viral in the meantime. When screenshots are shared across social media platforms, it becomes difficult to verify the authenticity of the screenshots because there are no tools for this purpose.

Figure 5. Screenshot example of a real and deleted tweet from @NikkiHaley.

Creating fake tweets is relatively easy, as there are a number of tools for generating fake tweets, such as Tweetgen, GenerateStatus, White Bird: Fake Tweet Generator, and Shashiirk.github.io. Hence, it is vital to be able to evaluate the validity of screenshots shared on social media.

There are several screenshot examples of tweets that are altered and shared across different social media platforms. For example, the screenshot of Fig. 6(a) spreads disinformation through an altered tweet that was neither posted by BBC news nor said by Macron. The screenshot of Fig. 6(b) also contains information that is fake and was not posted by the respective organization’s official account. The screenshot of Fig. 6(c) is fake too and was shared as satire rather than disinformation.

(a)

(b)

(c)

Figure 6. Examples of fake screenshot sharing.

There are several methods that could be employed in order to establish the veracity of an alleged tweet in a screenshot, that is, if the tweet had been really posted by the author. There are various fact-checking websites such as Politifact, FactCheck.org, and Snopes. These are some of the well-known fact-checking websites that analyze the authenticity of posted content. Users could easily search for the content of the tweet or the (alleged) author’s name on the fact-checking website. In addition, users can also search for the tweet content on the live web using some of these techniques like Twitter advanced search, Google search, and Google reverse image search. Searching web archives is an additional useful technique in circumstances where there appears to be deleted accounts or deleted posts.

Fig. 7 shows an example of a tweet screenshot that FactCheck.org determines is a fabricated tweet. Fig. 8 shows the screenshot of the headline from this fact-checking website. The fact-checking site searched through Rep. Marjorie Taylor Greene's official Twitter feed and Politwoops (a database of deleted tweets by politicians) to find out whether the screenshot is fake or real.

Figure 7. Example of a fake tweet screenshot; it is satire and @RepMTG did not tweet this.

Figure 8. Screenshot of the headline from FactCheck.org.

A URL specifies the location of a web resource on the internet. But when only a screenshot is available, we cannot be certain about what the URL is/was. So, it's quite difficult to reverse engineer a URL of a tweet by just looking at a screenshot only. Hence, the simplest way is to search for the tweet's text on the live web by performing a Google search. Fig. 9 shows an example of Google search for the screenshot of @michellemalkin’s post shared by @hannahgais.

(https://twitter.com/hannahgais/status/1526674114995527680)

Figure 9. Google search example by using the tweet’s text and author of the shared screenshot.

Another good way is to use the ‘Twitter advanced search’ technique, particularly for content which was posted a long time ago, as well as searching for replies. Fig. 10 shows an example of a screenshot where the tweet was posted in 2015 and could be easily verified using Twitter advanced search.

(a) A screenshot of @ProfBrianCox shared by @BazzaCC in 2022. (https://twitter.com/ProfBrianCox/status/1540332887601446912)

(b) Twitter advanced search UI.

(https://twitter.com/search-advanced)

(https://twitter.com/ProfBrianCox/status/677637760228921344)

(https://twitter.com/ProfBrianCox/status/677634768893267969)

Figure 10. Validating screenshot content using Twitter advanced search.

Another good way to check the authenticity of a post is to look into the web archives, such as the Wayback Machine and archive.today. Deleted posts cannot be found on the live web, but if the alleged post exists in the web archive, we can establish it as real. For example, Fig. 11 shows an example where the content of the screenshot is currently unavailable as the tweet was deleted but it exists in the web archive.

Figure 11. Example screenshot of a deleted tweet and the archived version.

Another technique for discovery of a tweet's URL is using the Wayback Machine’s CDX API. Fig. 12 shows an example of finding a tweet in the archives using the CDX API. In order to perform the search using the CDX API, two components are required - the twitter handle of the author and a possible date range. From the shared screenshot of Fig. 12, the two components are easily identified - the twitter handle of the author is https://twitter.com/NickHanauer and a possible date range could be 2022-05-25 – 2022-05-26 as the date on the shared screenshot is 2022-05-25. Upon receiving the list of mementos from the selected time period, it's required to check each one to see if the matching tweet can be found. The green highlighted text in the response section is the required URL found in the web archive.

Figure 12. Example of a tweet screenshot found in the archive using the CDX API.

Command using the CDX API:

curl -s "http://web.archive.org/cdx/search/cdx?url=https://twitter.com/NickHanauer/status&from=20220525&to=20220526&matchType=prefix" | sort -u -k 3 | awk '{print "https://web.archive.org/web/" $2 "/" $3};'

Response:

https://web.archive.org/web/20220525153810/https://twitter.com/NickHanauer/status/1305869227409027072

https://web.archive.org/web/20220526062353/https://twitter.com/NickHanauer/status/1305869227409027072

https://web.archive.org/web/20220526035516/https://twitter.com/NickHanauer/status/1305869227409027072

https://web.archive.org/web/20220525184648/https://twitter.com/NickHanauer/status/1305869227409027072

https://web.archive.org/web/20220525205256/https://twitter.com/NickHanauer/status/1374401501024583683

https://web.archive.org/web/20220525164026/https://twitter.com/NickHanauer/status/1529220873697124353

https://web.archive.org/web/20220525222707/https://twitter.com/NickHanauer/status/1529220873697124353

https://web.archive.org/web/20220525121648/https://twitter.com/NickHanauer/status/1529371442831413251

https://web.archive.org/web/20220525230530/https://twitter.com/NickHanauer/status/1529599421302054912

There are many methods to detect whether a tweet screenshot is fake or real by utilizing the live web and web archives. An automated system linking these services would determine whether the content had been really posted by the alleged author. This is the aim of the project "Did They Really Tweet That? Detecting Misattribution Disinformation in Screenshots of Social Media Posts", funded by the US Department of Education. The outcomes of this research will help to reduce the spread of disinformation by making it easier to determine the veracity of screenshots of alleged tweets.

The first task of this research is to create a dataset of screenshots shared on social media, both real and fake. We have collected about 200 examples, which are shared in this Github repo.

Here, we briefly discuss how the collected dataset has been organized based on a number of fields: Shared post’s URL, Original post’s URL, Category, Reason, Content Category, Structural Features, Post Type, Social Media Platform, Search Strategy, Annotated Images, Screenshot, and Remarks.

Shared post’s URL: This is the URL where the screenshot of a post is shared. For example, Fig. 13 shows a screenshot of a tweet being shared by another Twitter user.

Figure 13. A screenshot of a tweet by @RepClayHiggins shared by @BostonJoan. (https://twitter.com/bostonjoan/status/1498062875172249601)

Original post’s URL: This is the URL where the post of the screenshot exists. If the URL cannot be found, then the screenshot could be fake or the post could have been deleted. For example, the URL of the original post of the tweet shared as the screenshot in Fig. 13 is https://twitter.com/RepClayHiggins/status/1498015748492599297.

Category: This refers whether the screenshot is real, fake, or unknown. For example, we can determine that the screenshot in Fig. 13 is real, because it was found on live web. On the other hand, we can classify the screenshot in Fig. 14 as fake because it was not posted by the author, according to a fact-checking website. The fact-checking website determined the tweet as fake by going through the official Twitter feed of Nadine Dorries and Politwoops for any such deleted tweets.

Figure 14. Screenshot example of a fake tweet.

Reason: Here we provide the reason for categorizing an example as fake, real or unknown. Fig. 15 shows an example of a screenshot categorized as real because, although it is not available on the live web (it was deleted by the author), it is available in a web archive. However, the example in Fig. 16 does not exist on the live web, a web archive, nor on a fact-checking website. This increases the probability that the content of the shared screenshot might be fake. So, such examples are categorized as unknown in the dataset.

Figure 15. Screenshot example of a deleted tweet and its archived version.

Figure 16. Screenshot example of a supposedly fake tweet.

Content Category: The content category is classified based on the topic of discussion of the shared screenshot. The examples collected until now are mostly political, followed by public safety, health, entertainment, and satire.

Structural Features: Screenshots of tweets could be a simple tweet posted by a single author, a thread having single/multiple authors, or even concatenation of multiple tweets. For example, Fig. 17(a) shows a simple screenshot example where the structural feature belongs to the post class ‘Single tweet, Single author’ whereas the example of Fig. 17(b) shows a thread with multiple authors, which belongs to the post class ‘Multiple tweets, Multiple authors’. The definition of these post classes have been introduced by Nwala et al.

(a) Single tweet, Single author

(https://twitter.com/rvawonk/status/1503227687917305863)

(b) Multiple tweets, Multiple authors (https://twitter.com/arictoler/status/1498302396543422464)

Figure 17. Screenshot examples defining structural features.

Post Type: This field specifies whether a content of screenshot is a status, reply, thread, or a cropped snapshot of any of these. For instance, the post type of Fig 17(a) is status, while Fig. 17(b) is part of a thread.

Social Media Platform: This is the platform from which the alleged post originates. All of our collected examples belong to the Twitter social media platform.

Search Strategy: The search strategy is defined as the method used to establish the veracity of the post. Some of the strategies we adopted while collecting the data are text search on Twitter, Twitter advanced search, searching fact-checking websites, and searching through web archives.

Annotated Images: There are some examples where the shared screenshots contain different types of annotations, such as highlighted text, backgrounds having watermarks, or any other marks externally imposed on the image. These annotations are used for a variety of purposes. Highlighted text is often used to emphasize certain portions of a text, watermarks are used to preserve copyright, and cross marks are used to prevent sharing of any particular information. For instance, Fig. 18 shows two annotated screenshots, where one has a red cross mark and another one has some highlighted text in yellow.

(https://twitter.com/ddale8/status/1510706731470491653)

(https://twitter.com/David_Moscrop/status/1514235029613916175)

Screenshot: This field contains the Google Drive link to where we have saved the image of the shared screenshot.

Remarks: This field is reserved for any other relevant comments regarding the screenshot example, such as whether the content on the shared screenshot was deleted, or author’s account was deleted/suspended.

These fields have been chosen to organize the collected dataset and could evolve further as we gather more examples. We would appreciate the suggestion of any additional fields, examples, or feedback of any kind regarding this dataset.

Screenshots are commonly used across various social media platforms these days to share information and increase engagement. Checking the authenticity of a screenshot is an important factor to consider because fake screenshots would lead to disinformation spread. It is crucial to find out whether a screenshot is real or fake as no tool currently exists to establish the validity of a screenshot. The screenshot examples and the associated fields of the dataset will be used in a prototype web service to estimate the probability of a screenshot being fake or real.

--- Tarannum Zaki (@tarannum_zaki)

Search This Blog

Web Science and Digital Libraries Research Group

2022-12-12: Disinformation Spread on Social Media through Screenshot Sharing: Dataset Description

Comments

Post a Comment