2022-12-12: Disinformation Spread on Social Media through Screenshot Sharing: Dataset Description
It wasn’t so long ago when Elon Musk, CEO of Tesla Motors and SpaceX, offered to buy Twitter by posting a tweet (Fig. 1). Eventually, a number of fake tweets relevant to this issue went viral on social media (Figs. 2 and 3). The BOOM fact-checking website analyzed the Musk screenshots and determined that the tweets of Figs. 2 and 3 weren’t actually posted. The screenshots had been edited to appear as Elon Musk’s tweets and then were circulated on social media, in this case intended as humorous commentary on the situation. Therefore, it’s important to verify whether a screenshot is fake or real before sharing it further.
Figure 2. Fake tweet screenshot on Twitter. | Figure 3. Fake tweet screenshot on Facebook. (https://www.facebook.com/justjoelscafe/posts/956811948330818) |
Creating fake tweets is relatively easy, as there are a number of tools for generating fake tweets, such as Tweetgen, GenerateStatus, White Bird: Fake Tweet Generator, and Shashiirk.github.io. Hence, it is vital to be able to evaluate the validity of screenshots shared on social media.
There are several screenshot examples of tweets that are altered and shared across different social media platforms. For example, the screenshot of Fig. 6(a) spreads disinformation through an altered tweet that was neither posted by BBC news nor said by Macron. The screenshot of Fig. 6(b) also contains information that is fake and was not posted by the respective organization’s official account. The screenshot of Fig. 6(c) is fake too and was shared as satire rather than disinformation.
(a) | (b) | (c) |
There are several methods that could be employed in order to establish the veracity of an alleged tweet in a screenshot, that is, if the tweet had been really posted by the author. There are various fact-checking websites such as Politifact, FactCheck.org, and Snopes. These are some of the well-known fact-checking websites that analyze the authenticity of posted content. Users could easily search for the content of the tweet or the (alleged) author’s name on the fact-checking website. In addition, users can also search for the tweet content on the live web using some of these techniques like Twitter advanced search, Google search, and Google reverse image search. Searching web archives is an additional useful technique in circumstances where there appears to be deleted accounts or deleted posts.
Fig. 7 shows an example of a tweet screenshot that FactCheck.org determines is a fabricated tweet. Fig. 8 shows the screenshot of the headline from this fact-checking website. The fact-checking site searched through Rep. Marjorie Taylor Greene's official Twitter feed and Politwoops (a database of deleted tweets by politicians) to find out whether the screenshot is fake or real.
Another good way is to use the ‘Twitter advanced search’ technique, particularly for content which was posted a long time ago, as well as searching for replies. Fig. 10 shows an example of a screenshot where the tweet was posted in 2015 and could be easily verified using Twitter advanced search.
(https://twitter.com/ProfBrianCox/status/677637760228921344) | (https://twitter.com/ProfBrianCox/status/677634768893267969) |
(c) Original tweet posted in 2015.
Figure 10. Validating screenshot content using Twitter advanced search.
Figure 11. Example screenshot of a deleted tweet and the archived version.
curl -s "http://web.archive.org/cdx/search/cdx?url=https://twitter.com/NickHanauer/status&from=20220525&to=20220526&matchType=prefix" | sort -u -k 3 | awk '{print "https://web.archive.org/web/" $2 "/" $3};' |
Response:
https://web.archive.org/web/20220525153810/https://twitter.com/NickHanauer/status/1305869227409027072 https://web.archive.org/web/20220526062353/https://twitter.com/NickHanauer/status/1305869227409027072 https://web.archive.org/web/20220526035516/https://twitter.com/NickHanauer/status/1305869227409027072 https://web.archive.org/web/20220525184648/https://twitter.com/NickHanauer/status/1305869227409027072 https://web.archive.org/web/20220525205256/https://twitter.com/NickHanauer/status/1374401501024583683 https://web.archive.org/web/20220525164026/https://twitter.com/NickHanauer/status/1529220873697124353 https://web.archive.org/web/20220525222707/https://twitter.com/NickHanauer/status/1529220873697124353 https://web.archive.org/web/20220525121648/https://twitter.com/NickHanauer/status/1529371442831413251 https://web.archive.org/web/20220525230530/https://twitter.com/NickHanauer/status/1529599421302054912 |
There are many methods to detect whether a tweet screenshot is fake or real by utilizing the live web and web archives. An automated system linking these services would determine whether the content had been really posted by the alleged author. This is the aim of the project "Did They Really Tweet That? Detecting Misattribution Disinformation in Screenshots of Social Media Posts", funded by the US Department of Education. The outcomes of this research will help to reduce the spread of disinformation by making it easier to determine the veracity of screenshots of alleged tweets.
The first task of this research is to create a dataset of screenshots shared on social media, both real and fake. We have collected about 200 examples, which are shared in this Github repo.
- Shared post’s URL: This is the URL where the screenshot of a post is shared. For example, Fig. 13 shows a screenshot of a tweet being shared by another Twitter user.
- Original post’s URL: This is the URL where the post of the screenshot exists. If the URL cannot be found, then the screenshot could be fake or the post could have been deleted. For example, the URL of the original post of the tweet shared as the screenshot in Fig. 13 is https://twitter.com/RepClayHiggins/status/1498015748492599297.
- Category: This refers whether the screenshot is real, fake, or unknown. For example, we can determine that the screenshot in Fig. 13 is real, because it was found on live web. On the other hand, we can classify the screenshot in Fig. 14 as fake because it was not posted by the author, according to a fact-checking website. The fact-checking website determined the tweet as fake by going through the official Twitter feed of Nadine Dorries and Politwoops for any such deleted tweets.
- Reason: Here we provide the reason for categorizing an example as fake, real or unknown. Fig. 15 shows an example of a screenshot categorized as real because, although it is not available on the live web (it was deleted by the author), it is available in a web archive. However, the example in Fig. 16 does not exist on the live web, a web archive, nor on a fact-checking website. This increases the probability that the content of the shared screenshot might be fake. So, such examples are categorized as unknown in the dataset.
Content Category: The content category is classified based on the topic of discussion of the shared screenshot. The examples collected until now are mostly political, followed by public safety, health, entertainment, and satire.
Structural Features: Screenshots of tweets could be a simple tweet posted by a single author, a thread having single/multiple authors, or even concatenation of multiple tweets. For example, Fig. 17(a) shows a simple screenshot example where the structural feature belongs to the post class ‘Single tweet, Single author’ whereas the example of Fig. 17(b) shows a thread with multiple authors, which belongs to the post class ‘Multiple tweets, Multiple authors’. The definition of these post classes have been introduced by Nwala et al.
(a) Single tweet, Single author | (b) Multiple tweets, Multiple authors (https://twitter.com/arictoler/status/1498302396543422464) |
Post Type: This field specifies whether a content of screenshot is a status, reply, thread, or a cropped snapshot of any of these. For instance, the post type of Fig 17(a) is status, while Fig. 17(b) is part of a thread.
Social Media Platform: This is the platform from which the alleged post originates. All of our collected examples belong to the Twitter social media platform.
Search Strategy: The search strategy is defined as the method used to establish the veracity of the post. Some of the strategies we adopted while collecting the data are text search on Twitter, Twitter advanced search, searching fact-checking websites, and searching through web archives.
Annotated Images: There are some examples where the shared screenshots contain different types of annotations, such as highlighted text, backgrounds having watermarks, or any other marks externally imposed on the image. These annotations are used for a variety of purposes. Highlighted text is often used to emphasize certain portions of a text, watermarks are used to preserve copyright, and cross marks are used to prevent sharing of any particular information. For instance, Fig. 18 shows two annotated screenshots, where one has a red cross mark and another one has some highlighted text in yellow.
(https://twitter.com/ddale8/status/1510706731470491653) | (https://twitter.com/David_Moscrop/status/1514235029613916175) |
Screenshot: This field contains the Google Drive link to where we have saved the image of the shared screenshot.
Remarks: This field is reserved for any other relevant comments regarding the screenshot example, such as whether the content on the shared screenshot was deleted, or author’s account was deleted/suspended.
These fields have been chosen to organize the collected dataset and could evolve further as we gather more examples. We would appreciate the suggestion of any additional fields, examples, or feedback of any kind regarding this dataset.
Screenshots are commonly used across various social media platforms these days to share information and increase engagement. Checking the authenticity of a screenshot is an important factor to consider because fake screenshots would lead to disinformation spread. It is crucial to find out whether a screenshot is real or fake as no tool currently exists to establish the validity of a screenshot. The screenshot examples and the associated fields of the dataset will be used in a prototype web service to estimate the probability of a screenshot being fake or real.
--- Tarannum Zaki (@tarannum_zaki)
Comments
Post a Comment