2018-04-09: Trip Report for the National Forum on Ethics and Archiving the Web (EAW)

On March 23-24, 2018 I attended the National Forum on Ethics and Archiving the Web (EAW), hosted at the New Museum and organized by Rhizome and the members of the Documenting the Now project.  The nor'easter "Toby" frustrated the travel plans of many, including causing my friend Martin Klein to have to cancel completely and for me to not arrive at the New Museum until after the start of the second session at 2pm on Thursday.  Fortunately, all the sessions were recorded and I link to them below.

Day 1 -- March 22, 2018

Session 1 (recording) began with a welcome, and then a keynote by Marisa Parham, entitled "The Internet of Affects: Haunting Down Data".  I did have the privilege of seeing her keynote at the last DocNow meeting in December, and looking at the tweets ("#eaw18") she addressed some of the same themes, including the issues of the process of archiving social media (e.g., tweets) and the resulting decontextualization, including "Twitter as dataset vs. Twitter as experience", and "how do we reproduce the feeling of community and enhance our understanding of how to read sources and how people in the past and present are engaged with each other?"  She also made reference to the Twitter heat map for showing interaction with the Ferguson grand jury verdict ("How a nation exploded over grand jury verdict: Twitter heat map shows how 3.5 million #Ferguson tweets were sent as news broke that Darren Wilson would not face trial").

After Marisa's keynote was the panel on "Archiving Trauma", with Michael Connor (moderator), Chido Muchemwa, Nick Ruest (slides), Coral Salomón, Tonia Sutherland, and Lauren Work.  There are too many important topics here and I did not experience the presentations directly, so I will refer you to the recording for further information and a handful of selected tweets below. 

The next session after lunch was "Documenting Hate" (recording), with Aria Dean (moderator), Patrick Davison, Joan Donovan, Renee Saucier, and Caroline Sinders.  I arrived at the New Museum about 10 minutes into this panel.  Caroline spoke about the Pepe the Frog meme, its appropriation by Neo-Nazis, and the attempt by its creator to wrest it back -- "How do you balance the creator’s intentions with how culture has remixed their work?"

Joan spoke about a range of topics, including archiving the Daily Stormer forum, archiving the disinformation regarding the attacks in Charlottesville this summer (including false information originating on 4chan about who drove the car), and an algorithmic image collection technique for visualizing trending images in the collection.

Renee Saucier talked about experiences collecting sites for the "Canadian Political Parties and Political Interest Groups" (Archive-It collection 227), which includes Neo-Nazi and affiliated political parties.

The next panel was "Web Archiving as Civic Duty", with Amelia Acker (co-moderator), Natalie Baur, Adam Kriesberg (co-moderator) (transcript), Muira McCammon, and Hanna E. Morris.  My own notes on this session are sparse (in part because most of the presenters did not use slides), so I'll include a handful of tweets I noted that I feel succinctly capture the essence of the presentations.  I did find a link to Muria's MS thesis "Reimagining and Rewriting the Guantánamo Bay Detainee Library: Translation, Ideology, and Power", but it is currently under embargo.  I did find an interview with her that is available and relevant.  Relevant to Muria's work with deleted US Govt accounts is Justin Littman's recent description of a disinformation attack with re-registering deleted accounts ("Vulnerabilities in the U.S. Digital Registry, Twitter, and the Internet Archive"). 2018-04-17 update: Muira just published two related articles about deleted tweets: "Trouble @JTFGTMO" and "Can They Really Delete That?".

The third session, "Curation and Power" (recording) began with a panel with Jess Ogden (moderator), Morehshin Allahyari, Anisa Hawes, Margaret Hedstrom, and Lozana Rossenova.  Again, I'll borrow heavily from tweets. 

The final session for Thursday was the keynote by Safiya Noble, based on her recent book "Algorithms of Oppression" (recording).  I really enjoyed Safiya's keynote; I had heard of some of the buzz and controversy (see my thread (1, 2, 3) about archiving some of the controversy) around the book but I had not yet given it a careful review (if you're not familiar with it, read this five minute summary Safiya wrote for Time).  I include several insightful tweets from others below, but I'll also summarize some of the points that I took away from her presentation (and they should be read as such and not as a faithful or complete transcription of her talk).

First, as a computer scientist I understand and am sympathetic to the idea that ranking algorithms that Google et al. use should be neutral.  It's an ultimately naive and untenable position, but I'd be lying if I said I did not understand the appeal.  The algorithms that help us differentiate quality pages from spam pages about everyday topics like artists, restaurants, and cat pictures do what they do well.  In one of the examples I use in my lecture (slides 55-58), it's the reason why for the query "DJ Shadow", the and links appear on Google's page 1, and appears on page 15: in this case the ranking of the sites based on their popularity in terms of links, searches, clicks, and other user-oriented metrics makes sense.  But what happens when the query is, as Safiya provides in her first example, "black girls"?  The result (ca. 2011) is almost entirely porn (cf. the in-conference result for "asian girls"), and the algorithms that served us so well in finding quality DJ Shadow pages in this case produce a socially undesirable result.  Sure, this undesirable result is from having indexed the global corpus (and our interactions with it) and is thus a mirror of the society that created those pages, but given the centrality in our life that Google enjoys and the fact that people consider it an oracle rather than just a tool that gives undesirable results when indexing undesirable content, it is irresponsible for Google to ignore the feedback loop that they provide; they no longer just reflect the bias, they hegemonically reinforce the bias, as well as give attack vectors for those who would defend the bias

Furthermore, there is already precedent for adjusting search results to eliminate bias in other dimensions: for example, PageRank by itself is biased against late-arriving pages/sites (e.g., "Impact of Web Search Engines on Page Popularity"), so search engines (SEs) adjust the rankings to accommodate these pages.  Similarly, Google has a history of intervening to remove "Google Bombs" (e.g., "miserable failure"), punish attempts to modify ranking, and even replacing results pages with jokes -- if these modifications are possible, then Google can no longer pretend the algorithm results are inviolable. 

She did not confine her criticism to Google, she also examined query results in digital libraries like ArtStor.  The metadata describing the contents in the DL originate from a point-of-view, and queries with a different POV will not return the expected results.  I use similar examples in my DL lecture on metadata (my favorite is reminding the students that the Vietnamese refer to the Vietnam War as the "American War"), stressing that even actions as seemingly basic as assigning DNS country codes (e.g., ".ps") are fraught with geopolitics, and that neutrality is an illusion even in a discipline like computer science. 

There's a lot more to her talk than I have presented, and I encourage you to take the time to view it.  We can no longer pretend Google is just the "backrub" crawler and interface; it is a lens that both shows and shapes who we are.  That's an awesome responsibility and has to be treated as such.

Day 2 -- March 23, 2018

The second day began with the panel "Web as Witness - Archiving & Human Rights" (recording), with Pamela Graham (moderator), Anna Banchik, Jeff Deutch, Natalia Krapiva, and Dalila Mujagic. Anna and Natalia presented the activities of the UC Berkeley Human Rights Investigations Lab, where they do open-source investigations (discovering, verifying, geo-locating, more) publicly available data of human rights violations.  Next was Jeff talking about the Syrian Archive, and the challenges they faced with Youtube algorithmically removing what they believed to be "extremist content".  He also had a nice demo about how they used image analysis to identify munitions videos uploaded by Syrians.  Dalila presented the work of WITNESS, an organization promoting the use of video to document human rights violations and how they can be used as evidence.  The final presentation was about the (a documentation project about civilian causalities in air strikes), but I missed a good part of this presentation as I focused on my upcoming panel. 

My session, "Fidelity, Integrity, & Compromise", was Ada Lerner (moderator) (site), Ashley Blewer (slides, transcript) Michael L. Nelson (me) (slides), and Shawn Walker (slides).  I had the luxury of going last, but that meant that I was so focused on reviewing my own material that I could not closely follow their presentations.  I and my students have read Ada's paper and it is definitely worth reviewing.  They review a series of attacks (and fixes) that all center around "abandoned" live web resources (what we called "zombies") that can be (re-)registered and then included in historical pages.  That sounds like a far-fetched attack vector, except when you remember that modern pages include 100s of resources from many different sites via Javascript, and there is a good chance that any page is likely to include a zombie whose live web domain is available for purchase.  Scott's presentation dealt with research issues surrounding using social media, and Ashley's talk dealt with role of using fixity information (e.g., "There's a lot "oh I should be doing that" or "I do that" but without being integrated holistically into preservation systems in a way that brings value or a clear understand as to the "why"").  As for my talk, I asserted that Brian Williams first performed "Gin and Juice" in 1992, a full year before Snoop Dogg, and I have a video of a page in the Internet Archive to "prove" it.  The actual URI in which it is indexed in the Internet Archive is obfuscated, but this video is 1) of an actual page in the IA, that 2) pulls live web content into the archive, despite the fixes that Ada provided, and 3) the page rewrites the URL in the address bar to pretend to be at a different URL and time (in this case,, and 19920531014618 (May 31, 1992)). 

The last panel before lunch was "Archives for Change", with Hannah Mandel (moderator), Lara Baladi, Natalie Cadranel, Lae’l Hughes-Watkins, and Mehdi Yahyanejad.  My notes for this session are sparse, so again I'll just highlight a handful of useful tweets.

After lunch, the next session (recording) was a conversation between Jarrett Drake and Stacie Williams on their experiences developing the People's Archive of Police Violence in Cleveland, which "collects, preserves, and shares the stories, memories, and accounts of police violence as experienced or observed by Cleveland citizens."  This was the only panel with the format of two people having a conversation (effectively interviewing each other) about their personal transformation and lessons learned.

The next session was "Stewardship & Usage", with Jefferson Bailey, Monique Lassere, Justin Littman, Allan Martell, Anthony Sanchez.  Jefferson's excellent talk was entitled "Lets put our money where our ethics are", and was an eye-opening discussion about the state of funding (or lack thereof) for web archiving. The tweets below capture the essence of the presentation, but this is definitely one you should take the time to watch.  Allan's presentation addressed the issues about building "community archives" and being aware of tensions that exist between different marginalized groups. Justin's presentation was excellent, detailing both GWU's collection activities and the associated ethical challenges (including who and what to collect) and the gap between collecting via APIs and archiving web representations.  I believe Anthony and Monique jointly gave their presentation about ethical web archiving requires proper representation from marginalized communities.

The next panel "The Right to be Forgotten", was in Session 7 (recording), and featured Joyce Gabiola (moderator), Dorothy Howard, and Katrina Windon.  The right to be forgotten is a significant issue facing search engines in the EU, but has yet to arrive as a legal issue in the US.  Again, my notes on this session are sparse, so I'm relying on tweets. 

The final regular panel was "The Ethics of Digital Folklore", and featured Dragan Espenschied (moderator) (notes), Frances Corry, Ruth Gebreyesus, Ian Milligan (slides), and Ari Spool.  At this point my laptop's battery died so I have absolutely no notes on this session. 

The final session was with Elizabeth Castle, Marcella Gilbert, Madonna Thunder Hawk, with an approximately 10 minute rough cut preview of "Warrior Women", a documentary about Madonna Thunder Hawk, her daughter Marcella Gilbert, Standing Rock, and the DAPL protests.

Day 3 -- March 24, 2018

Unfortunately, I had to leave on Saturday and was unable to attend any of the nine workshop sessions: "Ethical Collecting with Webrecorder", "Distributed Web of Care", "Open Source Forensics", "Ethically Designing Social Media from Scratch", "Monitoring Government Websites with EDGI", "Community-Based Participatory Research", "Data Sharing", "Webrecorder - Sneak Preview", "Artists’ Studio Archives", and unconference slots.   There are three additional recorded sessions corresponding to the workshops that I'll link here (session 8, session 9, session 10) because they'll eventually scroll off the main page.

This was a great event and the enthusiasm with which it was greeted is an indication of the topic.  There were so many great presentations that I'm left with the unenviable task of writing a trip report that's simultaneously too long and does not do justice to any of the presentations.  I'd like to thank the other members of my panel (AdaShawn, and Ashley), all who live tweeted the event, the organizers at Rhizome (esp. Michael Connor), Documenting the Now (esp. Bergis Jules), the New Museum, and the funders: IMLS and the Knight Foundation.   I hope they will find a way to do this again soon. 


See also: Ashley Blewer wrote a short summary of EAW, with a focus on the keynotes and  three different presentations.  Please let me know if there are other summaries / trip reports to add.

Also, please feel free to contact me with additions / corrections for the information and links above.  

