2018-04-23: "Grampa, what's a deleted tweet?"
Took screen shot, just in case, but I fear #Breitbart is well beyond the point of decency and shame that they would delete this insane tweet. #INTL4335 #Islamophobia pic.twitter.com/ipo1MhDmNI— Cas Mudde 🌪️ (@CasMudde) February 5, 2018
In early February, 2018 Breitbart News made a splash with its inflammatory tweet suggesting Muslims will end Super Bowl, which they deleted twelve hours later stating it did not meet their editorial standards. The deleted tweet had an imaginary conversation between a Muslim child and a grandparent about the Super Bowl and linked to one of articles on the declining TV ratings of National Football League (NFL) for the annual championship game. News articles from The Hill, Huffington Post, Politico, Independent, etc., talked about the deleted tweet controversy in detail.
We have deleted a tweet that did not meet our editorial standards.— Breitbart News (@BreitbartNews) February 5, 2018
Being web archiving researchers, we decided to look into the deleted tweet incident of Breitbart News to shed some light on their deleted tweets pattern over recent months.
Role of web archives in finding deleted tweets
Hany M. SalahEdeen and Michael L. Nelson in their paper, "Losing my revolution: How many resources shared on social media have been lost?", talk about the amount of resources shared in social media that is still live or present in the public web archives. They concluded that nearly 11% of the shared resources are lost in their first year and after that we lose the shared resources at a rate of 0.02% per day.
Web archives such as Internet Archive, Archive-It, UK Web Archives, etc., have an important role in the preservation of resources shared in social media. Using web archives, sometimes we can recover deleted tweets. For example, Miranda Smith in her blog post, "Twitter Follower Count History via Internet Archive" talks about using Internet Archive to fetch historical Twitter data to graph followers count over time. She also explains the advantages of using web archives for finding historical data of users over the Twitter API.
The only caveat in using web archives to uncover the deleted tweets is its limited coverage of Twitter. But for popular Twitter accounts having a high number of mementos such as RealDonaldTrump, Barrack Obama, BreitbartNews, CNN, etc., we can often uncover deleted tweets. The issue of "How Much of the Web Is Archived?" has been discussed by Ainsworth et al. but there has been no separate analysis on how much of Twitter is archived which will help us in estimating the accuracy of finding deleted tweets using web archives.
Web services like Politwoops track deleted tweets of public officials including people currently in office and candidates for office in the USA and some EU nations. However, tweets deleted before a person becomes a candidate or tweets deleted after a person left office will not be covered. Although Politwoops tracks the elected officials, it misses out on appointed government officials like Michael Flynn. For these twitter accounts web archives are the lone solution to finding their deleted tweets. The most important aspect of not relying totally on these web services alone to find the deleted tweets is due to them being banned by Twitter. It happened once in June, 2015 with Twitter citing violation of the developer agreement. It took Politwoops six months to resume its services back in December, 2015. These instances of being banned by Twitter suggest that we explore web archives to uncover deleted tweets in case of services like Politwoops are banned again.
Why are deleted tweets important?
With the surge in the usage of social media sites like Twitter, Facebook etc., researchers have been using social media sites to study patterns of online user behaviour. In context of Twitter, deleted tweets play an important role in understanding users' behavioural patterns. In the paper, "An Examination of Regret in Bullying Tweets", Xu et al. built a SVM-based classifier to predict deleted tweets from Twitter users posting bullying related tweets to later regret and delete them. Petrovic et al., in their paper, "I Wish I Didn’t Say That! Analyzing and Predicting Deleted Messages in Twitter", discuss about the reasons for deleted tweets and using a machine learning approach to predict them. They concluded by saying that tweets with swear words have higher probability of being deleted. Zhou et al. in their papers, "Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones" and "Identifying Regrettable Messages from Tweets", mention the impact of published tweets that cannot be undone by deletion, as other users have noticed the tweet and cached them even before they are deleted.
How were deleted tweets found?
To begin our analysis, we used the Twitter API to fetch the most recent 3200 tweets from Breitbart News' Twitter timeline. The live tweets fetched from the Twitter API spanned from 2017-10-22 to 2018-02-18. Later, we received the TimeMap for Breitbart's Twitter page using Memgator, the Memento aggregator service built by Sawood Alam. Using the URI-Ms from the fetched TimeMap, we collected mementos for Breitbart's Twitter page within the specified time range of live tweets fetched using the Twitter API.
Code to fetch recent tweets using Python-Twitter API
import twitter api = twitter.Api(consumer_key='xxxxxx', consumer_secret='xxxxxx', access_token_key='xxxxxx', access_token_secret='xxxxxx', sleep_on_rate_limit=True) twitter_response = api.GetUserTimeline(screen_name=screen_name, count=200, include_rts=True)
Shell command to run Memgator locally
$ memgator --contimeout=10s --agent=XXXXXX server MemGator 1.0-rc7 _____ _______ __ / \ _____ _____ / _____/______/ |___________ / Y Y \/ __ \/ \/ \ ___\__ \ _/ _ \_ _ \ / | | \ ___/ Y Y \ \_\ \/ __ | | |_| | | \/ \__/___\__/\____\__|_|__/\_______/_____|__|\___/|__| TimeMap : http://localhost:1208/timemap/{FORMAT}/{URI-R} TimeGate : http://localhost:1208/timegate/{URI-R} [Accept-Datetime] Memento : http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{URI-R} # FORMAT => link|json|cdxj # DATETIME => YYYY[MM[DD[hh[mm[ss]]]]] # Accept-Datetime => Header in RFC1123 format
Code to fetch TimeMap for any twitter handle
1 2 3 4 | url = "http://localhost:1208/timemap/" data_format = "cdxj" command = url + data_format + "/http://twitter.com/<screen-name>" + response = requests.get(command) |
Code to parse tweets, their timestamps and tweet ids from mementos
import bs4 soup = bs4.BeautifulSoup(open(<HTML representation of Memento>),"html.parser") match_tweet_div_tag = soup.select('div.js-stream-tweet') for tag in match_tweet_div_tag: if tag.has_attr("data-tweet-id"): # Get Tweet id ........... # Parse tweets match_timeline_tweets = tag.select('p.js-tweet-text.tweet-text') ........... # Parse tweet timestamps match_tweet_timestamp = tag.find("span", {"class": "js-short-timestamp"}) ...........
Analysis of Deleted Tweets from Breitbart News
The most prominent of the 22 deleted tweets was the above mentioned Super Bowl deleted tweet. Talking about the above mentioned deleted tweet in context for people who are unaware of the role of web archives, we urge them that taking screenshots fearing something might be lost in future is smart but it would be even better if we push them to the web archives where it would be preserved for a longer time than compared to someone's private archive. For further information refer to Plinio Varagas's blog post "Links to Web Archives, not Search Engine Caches", where he talks about the difference between archived pages and search engine caches in terms of the decay period of the web pages.
Fig 1 - Super Bowl tweet on Internet Archive Tweet Memento at Internet Archive |
Fig 2 - Archived version of unretweeted tweet by Breitbart News Tweet memento at the Internet Archive |
Fig 3 - Live version of unretweeted tweet by Breitbart News Live Tweet Status |
Analysis of deleted tweets from John Carney and NolteNC
We fetched live tweets for John Carney using the Twitter API and then fetched the TimeMap for John Carney's Twitter page using Memgator and mementos within the time range of live tweets fetched using the Twitter API. Due to the low number of mementos within the specified time range, the analysis showed no deleted tweets. We then fetched live tweets from the Twitter API for John Carney for a week to find deleted tweets by comparing with all the previous responses from the Twitter API. We discovered that tweets older than seven days are automatically deleted on Tuesday and Saturday. The precise manner in deletion of tweets suggests the use of any automated tweet deletion service. There are a number of tweet deletion services like Twitter Deleter, Tweet Eraser etc. which delete tweets on certain conditions based on the lifespan of the tweet or the number of tweets to be present in the Twitter timeline at any given instance.
Fig 4 - John Carney's tweet deletion pattern shown with 50 tweet ids |
Fig 5 - NolteNC's original tweet
|
Conclusions
- It is not enough to make screen shots of controversial tweets but, we need to push web contents that we wish to preserve for future fearing of its loss to the web archives due to longer retention capability than our personal archives.
- For finding deleted tweets, web archives work effectively for popular accounts because they are archived often but for less popular accounts with fewer mementos this approach will not work.
- Although Breitbart News does not delete tweets often, some of its correspondents automatically delete their tweets, effectively deleting the corresponding retweets.
Comments
Post a Comment