Friday, April 13, 2018

2018-04-13: Web Archives are Used for Link Stability, Censorship Avoidance, and Traffic Siphoning

ISIS members immolating captured Jordanian pilot
Web archives have been used for purposes other than digital preservation and browsing historical data. These purposes can be divided into three categories:

  1. Uploading content to web archives to ensure continuous availability of the data.
  2. Avoiding governments' censorship or websites' terms of service.
  3. Using URLs from web archives, instead of direct links, for news sites with opposing ideologies to avoid increasing their web traffic and deprive them of ad revenue.

1. Uploading content to web archives to ensure continuous availability of the data


Web archives, by design, are intended to solve the problem of digital data preservation so people can access data when it is no longer available on the live web. In this paper, Who and What Links to the Internet Archive, (Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, and Michael L. Nelson, 2013), the authors show that the percentage of the requested archived pages which currently do not exist on the live web is 65%. The paper also determines where do Internet Archive's Wayback Machine users come from. The following table, from the paper, contains the top 10 referrers that link to IA’s Wayback Machine. The list of top 10 referrers represents 51.9% of all the referrers. en.wikipedia.org outnumbers all other sites including search engines and the home page of Internet Archive (archive.org).
The top 10 referrers that link to IA’s Wayback Machine
Who and What Links to the Internet Archive, (AlNoamany et al. 2013) Table 5

Sometimes the archived data is controversial and the user wants to make sure that he or she can refer back to it later in case it is removed from the live web. A clear example of that is the deleted tweets from U.S. president Donald Trump.
Mr. Trump's deleted tweets on politwoops.eu


2. Avoiding governments' censorship or websites' terms of service


Using the Internet Archive to find a way around terms of service for file sharing sites was addressed by Justin Littman in a blog post, Islamic State Extremists Are Using the Internet Archive to Deliver Propaganda. He stated that ISIS sympathizers are using the Internet Archive as a web delivery platform for extremist propaganda, posing a threat to the archival mission of Internet Archive. Mr. Littman did not evaluate the content to determine if it is extremist in nature since much of it is in Arabic. This behavior is not new. It has been noted with some of the data uploaded by Al-Qaeda sympathizers a long time before ISIS was created. Al-Qaeda uploaded this file https://archive.org/details/osamatoobama to the Internet Archive on February 16 of 2010 to circumvent file sharing sites' content removal policies. ISIS sympathizers upload clips documenting battles, executions, or even video announcements by ISIS leaders to the Internet Archive because that type of data will get automatically removed from the web if uploaded to video sharing sites like Youtube to prevent extremists propaganda.

On February 4th of 2015, ISIS uploaded a video to the Internet Archive featuring the execution by immolation of captured Jordanian pilot Muath Al-Kasasbeh; that's only one day after the execution! This video violates Youtube's terms of service and is no longer on Youtube.
https://archive.org/details/YouTube_201502
ISIS members immolating captured Jordanian pilot (graphic video)
In fact, Youtube's algorithm is so aggressive that it removed thousands of videos documenting the Syrian revolution. Activists argued that the removed videos were uploaded for the purpose of documenting atrocities during the Syrian government's crackdown, and that Youtube killed any possible hope for future war crimes prosecutions.

Hani Al-Sibai, a lawyer, Islamic scholar, Al-Qaeda sympathizer, and a former member of The Egyptian Islamic Jihad Group who lives in London as a political refugee, uploads his content to the Internet Archive. Although he is anti-ISIS and, more often than not, his content does not encourage violence and he only had few issues with Youtube, he pushes his content to multiple sites on the web including web archiving sites to ensure continuous availability of his data.

For example, this is a an audio recording from Hani Al-Sibai condemning the immolation of the Jordanian pilot, Muath Al-Kasasbeh. Mr. Al-Sibai uploaded this recording to the Internet Archive a day after the execution.
https://archive.org/details/7arqTayyar
An audio recording by Hani Al-Sibai condemning the execution by burning (uploaded to IA a day after the execution)

These are some examples where the Internet Archive is used as a file sharing service. Clips are simultaneously uploaded to Youtube. Vimeo, and the Internet Archive for the purpose of sharing.
Screen-shot from justpaste.it where links to videos uploaded to IA are used for sharing purpose 
Both videos shown in the screen shot were removed from Youtube for violating terms of service, but they are not lost because they have been uploaded to the Internet Archive.

https://www.youtube.com/watch?v=Cznm0L5X9LE
Rebuttal from Hani Al-Sibai addressing ISIS spokesman's attack on Al-Qaeda leader Ayman Al-Zawaheri (removed from Youtube)

https://archive.org/details/Fajr3_201407
Rebuttal from Hani Al-Sibai addressing ISIS spokesman's attack on Al-Qaeda leader Ayman Al-Zawaheri (uploaded to IA)

https://www.youtube.com/watch?v=VuSgxhBtoic
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (removed from Youtube)

https://archive.org/details/Ta3liq_Hadi
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (uploaded to IA)
The same video was not removed from Vimeo
https://vimeo.com/111975796
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (uploaded to Vimeo)
I am not sure if web archiving sites have content moderation policies, but even with sharing sites that do, they are inconsistent! Youtube is a perfect example; no one knows what YouTube's rules even are anymore.

Less popular use of the Internet Archive include browsing the live web using Internet Archive links to bypass governments' censorship. Sometimes, governments block sites with opposing ideologies, but their archived versions remain accessible. When these governments realize that their censorship is being evaded, they entirely block the Internet Archive to prevent access to the the same content they blocked on the live web. In 2017, the IA’s Wayback Machine was blocked in India and in 2015, Russia blocked the Internet Archive over a single page!

3. Using URLs from web archives instead of direct links for news sites with opposing ideologies to deprive them of ad revenue

Even when the live web version is not blocked, there are situations where readers want to deny traffic and the resulting ad revenue for web sites with opposing ideologies. In a recent paper, Understanding Web Archiving Services and Their (Mis)Use on Social Media (Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, 2018), the authors presented a large-scale analysis of Web archiving services and their use on social network, the archived content, and how it is shared/used. They found that contentious news and social media posts are the most common types of content archived. Also, URLs from web archiving sites are widely posted on “fringe” groups in Reddit and 4chan to preserve controversial data that might disappear; this case also falls under the first category. Furthermore, the authors found evidence of groups' admins forcing members to use URLs from web archives instead of direct links to sites with opposing ideologies to refer to them without increasing their traffic or to deprive them of ad revenue. For instance, The_Donald subreddit systematically targets ad revenue of news sources with adverse ideologies using moderation bots that block URLs from those sites and prompt users to post archive URLs instead.

The authors also found that web archives are used to evade censorship policies in some communities: for example, /pol/ users post archive.is URLs to share content from 8chan and Facebook, which are banned on the platform, or to dodge word-filters (e.g., ‘smh’ becomes ‘baka’, so links to smh.com.au point to baka.com.au instead).

According to the authors, Reddit bots are responsible for posting a huge portion of archive URLs in Reddit due to moderators trying to ensure the availability of the data, but this practice affects the amount of traffic that the source sites would have received from Reddit.

I went on 4chan to include a few examples similar to those examined in the paper and despite not knowing what 4chan is prior to reading the paper, I was able to find a couple of examples of sharing archived links on 4chan in just under 2 minutes. I took screen shots of both examples; the threads have been deleted since 4chan removes threads after they reach page 10.

Pages are archived on archive.is then shared on 4chan
Sharing links to archive.org in a comment on 4chan

The take away message is that web archives have been used for purposes other than digital preservation and browsing historical data. These purposes include:
  1. Uploading content to web archives to mitigate the risk of data loss.
  2. Avoiding governments' censorship or websites' terms of service.
  3. Using URLs from web archives, instead of original source links for news sites with opposing ideologies to deprive them of ad revenue.
--
Hussam Hallak

No comments:

Post a Comment