This year's International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) took place in Hilversum, The Netherlands at The Netherlands Institute of Sound and Vision. It was the first in-person event since 2019 and the 20th anniversary of IIPC! The program offered between two and three tracks for attendees to choose from, so this trip report will give a summary of the sessions I was able to attend. For more information on the other sessions, check out the full conference schedule and the official hashtag (#IIPCWAC23).

Day One

To kick off Day One, Eppo van Nispen (@eppovannispen) from The Netherlands Institute for Sound & Vision gave the opening remarks.

✨ Super excited to follow the #IIPCWAC23 @NetPreserve organised by our colleagues @BeeldenGeluid @benglabs @KB_Nederland!👏🎉

💡Starting now! 🎉 Our director, Eppo asking AI how to start his introductory speech! Don't forget to quote @johanoomen 🎉 pic.twitter.com/wufmNjrWUb
— Camille Françoise @CMFrancoise@mastodon.social (@CMFrancoise) May 11, 2023

Keynote

Elliot Higgins (@EliotHiggins), the founder of Bellingcat (@Bellingcat), gave the opening Keynote. He presented the work Bellingcat is doing to fight disinformation in social media with open-source investigation and the help of over 6,000 trained volunteers.

KEYNOTE: Eliot Higgins, Bellingcat - 'Open Source Investigation Comes Of Age', Introduced and chaired by Johan Oomen, Sound & Vision.https://t.co/iHCyos1acl #IIPCWAC23 #IIPC20Years #WebArchiving pic.twitter.com/gdsWcv2ajx
— UK Web Archive (@UKWebArchive) May 11, 2023

Session #1: Research & Access

Samantha Fritz (@SamVFritz) from Archives Unleashed (@unleasharchives) presented "Through the ARCHWay: Opportunities to Support Access, Exploration, and Engagement with Web Archives". Archives are a largely untapped resource due to the complexity of archival data and the lack of tools available, so the Archives Unleashed Project is working to bridge the gap between researchers and the data available in the archives.

“If you can’t access data, you’re not going to use it”

To kick off Session 1, @SamVFritz is presenting @unleasharchives work to make archival data accessible to researchers pic.twitter.com/jd12zBfFbi
— Emily Escamilla (@EmilyEscamilla_) May 11, 2023

Good morning #Hilversum! Presenting @ #IIPCWAC23 this morning to talk about @unleasharchives Cohort Program & opportunities to support #acess #use #engagment w/ #webarchives

Slides: https://t.co/zdraeELc0E
C1 Projects: https://t.co/ns8VijWFQW
C2 Projects: https://t.co/3bMM3g9Vki pic.twitter.com/vxOEVXghAT
— Samantha Fritz (@SamVFritz) May 11, 2023

Leontien Talboom (@makethecatwise) and Mark Simon Haydn presented "Research-Ready Collections: Challenges and Opportunities in Making Web Archive Material Accessible", their work with the Archive of Tomorrow Project. The project worked to curate a collection of 10k targets relating to health in the UK Web Archive (@UKWebArchive) and to explore ethical collection from the Web and responsible republishing. Legal limitations remain a significant barrier, but the project was about to achieve an increase from 1% to 8% of archives sites being publicly accessible!

Leontien Talboom (@theUL) & Mark Haydn (@natlibscot) gave an overview of the Archive of Tomorrow project - 'Research-Ready' Collections: Challenges & Opportunities in Making Web Archive Material Accessible' at #IIPCWAC23 https://t.co/nDyB3OC8oc #IIPC20YEARS #UKLegalDeposit pic.twitter.com/FJRCwwtHCB
— UK Web Archive (@UKWebArchive) May 11, 2023

Jennifer Morival (University of Lille), Sara Aubry (@saraaubry, BnF), and Dorothée Benhamou-Suesser (BnF) presented "Developing New Academic Uses of Web Archives Collections: Challenges and Sessions Learned from the Experimental Service Deployed at the University of Lille During the ResPaDon Project". BnF has worked with the University of Lille to allow full access to BnF holdings at the University of Lille. They also shared their experiences and lessons learned in helping researchers leverage the BnF's holdings through tools, datasets, and trained mediators.

How can archived enable researchers to access and use their holdings through tools and documentation?

Dorothée Benhamou-Suesser, Jennifer Morival, and @saraaubry from @DLwebBnF closed Session 1 with a discussion of their work on @Respadon_Projet to do just that#IIPCWAC23 pic.twitter.com/s8lf90JmDD
— Emily Escamilla (@EmilyEscamilla_) May 11, 2023

Session #3: Panel: Supporting Digital Scholarship

For Session 3, Sarah Potvin (Texas A&M), Talya Cooper (@talya_cooper, New York University), and Emily Escamilla (@EmilyEscamilla_, Old Dominion University) presented "Institutional Web Archiving Initiatives to Support Digital Scholarship", a panel moderated by one of their collaborators Martin Klein (@mart1nkle1n, Los Alamos National Lab). They talked about the need for archiving scholarly software hosted on the Web and what Texas A&M and NYU are doing to address the problem in their institutions. With the help of the CoSAI project, NYU is developing a workflow to archive scholarly source code developed by NYU scholars. Texas A&M is teaching graduate students about Web archives and how they can ensure the URIs in their thesis or dissertation are archived and the content is preserved.

In a few minutes, @sp_meta @talya_cooper @mart1nkle1n and I are presenting a panel on Institutional Web Archiving Initiatives
to Support Digital Scholarship #IIPCWAC2023

Slides: https://t.co/nHjjIWYgRe
— Emily Escamilla (@EmilyEscamilla_) May 11, 2023

Honored to moderate the panel on institutional web archiving initiatives to support digital scholarship w/ @EmilyEscamilla_ @talya_cooper, and @sp_meta. Missing @VickyRampin #iipcwac23 pic.twitter.com/toz18PrpXa
— Martin Klein (@mart1nkle1n) May 11, 2023

Session #6: Social Media & Playback: Collaborative Approaches

Katrien Weyns (@KatrienWeyns) from meemoo (Flemish Institute for Archives) and Ellen Van Keer from KADOC kicked off Session 6 with their presentation "Archiving Social Media in Flemish Cultural or Private Archive, (How) Is It Possible".

And here it starts! 🎉@LN_ist from @meemoo_be and @KatrienWeyns #KADOC are introducing their work around archiving social media in Flemish Cultural or Private Archives 👏✨ pic.twitter.com/9kClKhICv0
— Camille Françoise @CMFrancoise@mastodon.social (@CMFrancoise) May 11, 2023

It is no secret that archiving social media presents unique and complex challenges. Zefi Kavvadia (@ZKavvadia), Katrien Weyns (@KatrienWeyns), Mirjam Schaap (@mrjmschaap), and Sophie Ham (@Sophies_posts) presented "Searching for a Little Help from My Friends: Reporting on the Efforts to Create an (Inter)national Distributed Collaborative Social Media Archiving Structure". They called for better collaboration between archives, institutions, and nations to tackle the complex challenges of archiving social media and the need for an improved legal policy to facilitate archiving social media as cultural heritage. They presented the results of a survey they conducted to gauge interest and challenges from potential collaborators.

@ZKavvadia @KatrienWeyns Mirjam Schaap & Sophie Ham on Searching for a little Help from My Friends : reporting on the efforts to create an (Inter)national Distributed collaborative Social media Archiving Structure. #iipcwzc23 pic.twitter.com/N9gLPV9GER
— Camille Françoise @CMFrancoise@mastodon.social (@CMFrancoise) May 11, 2023

Clare Stanton (@clare__stanton) from Harvard's Library Innovation Lab (@harvardlil) and Perma.cc (@permacc) presented "Collaborating on the Cutting Edge: Client Side Playback". They created WACZ-Exhibotor, a wrapper for web recorder's replay tool that shifts the burden of upkeep to a browser and away from the institution's servers. Clare presented the process of creating a working prototype for the #MeToo Project with Schlesinger Library and creating tools to make the process easy to replicate for others.

In Session 6, @clare__stanton from @permacc presented their work creating a Client-Side Playback. This tool enables non-software developers to integrate replay from WACZ directly into your website!

Check out their working prototype https://t.co/nXey7DmpZ9 pic.twitter.com/DX4uS60ggU
— Emily Escamilla (@EmilyEscamilla_) May 11, 2023

Session #7: Collaborations & Outreach

Ricardo Basílio (@ricardobasilio_) from ROSSIO presented "Linking Web Archiving with Arts and Humanities: The Collaboration Between ROSSIO and Arquivo.pt". Together, they created an arts and humanities archive that's available on the live Web.

Inge Rudomino (@IngeRudomino) from the Croatian Web Archive presented "Building Collaborative Collections: Experience of the Croatian Web Archive". They are working with other libraries, researchers, and the public to curate archives of local online history. They hosted a "HAWathon" to promote the crowdsourcing project and citizen science.

The "word of the day" award goes to @IngeRudomino for "HAWathon" - an effort to engage high schoolers in #webarchiving with HAW:https://t.co/2G7iOHPlOQ #IIPCWAC2023 pic.twitter.com/CPsmarZs4b
— Martin Klein (@mart1nkle1n) May 11, 2023

Youssef Eldakar from Bibliotheca Alexandria (@bibalexOfficial) presented "Your Software Development Internship in Web Archiving". He discussed their internship program and the ingredients that make it successful: intern, mentor, mini-project. Internships give interns real-world experience and host institutions are able to make extra progress.

Session #10: Lightning & Drop-In Talks

We closed out Day One with six lighting and drop-in talks. For more information, check out the thread below:

“How do we preserve the past in a violent present for an uncertain future?”

We had lightning talks to close Day 1 of #IIPCWAC23. @helveticade and Benjamin Royer from @ndc_org talked about Memory in Uncertainty

Report: https://t.co/Rbbdf31Knv
Slides: https://t.co/dOcu8pyGu1 pic.twitter.com/wGRTx5HmVK
— Emily Escamilla (@EmilyEscamilla_) May 11, 2023

Day Two

Workshop #4

To start Day Two, I attended the "Browser-Base Crawling for All: Getting Started with Browsertrix Cloud" workshop hosted by Andy Jackson (@anjacks0n), Anders Klindt Myrvoll (@AndersKlindt), and Ilya Kreymer (@IlyaKreymer). They introduced Browsertrix Cloud, an integrated Web archiving system, and demoed the process of setting up and running a crawl. The UI allows users to create, watch, and manage crawls in real time. One of the coolest features was the ability to dynamically add exclusions. The user could indicate the regular expression they wanted to exclude from the crawl and the URIs currently in the queue that matched the regular expression were highlighted. This allows users to fix crawler traps in real-time without having to stop or cancel the crawl. Additionally, Browsertrix Cloud can use credentials which allows it to work behind pay walls.

To kick off Day 2 of #iipcWAC23, @IlyaKreymer @AndersKlindt and @anjacks0n ran a workshop on Browsertrix Based Crawling. Participants were able to start their own crawls and ask questions. It can even run behind login pages

Collectively we ran 30 crawls. Really cool workshop! pic.twitter.com/oJeNh9yEfK
— Emily Escamilla (@EmilyEscamilla_) May 12, 2023

Session #12: Domain Crawls

Martin Klein (@mart1nkle1n) from Los Alamos National Lab (and ODU WSDL alum), presented "Laboratory Not Found? Analyzing LANL's Web Domain Crawl". This presentation was related to their previous work with LANL's institutional Web domain.

Thoroughly investigating -good & bad- link rot @LosAlamosNatLab by @mart1nkle1n #IIPCWAC23 #netpreserve pic.twitter.com/Gp2aQUyd6j
— KB NL research (@KBNLresearch) May 12, 2023

Session #13: Crawling, Playback, Sustainability

Ilya Kreymer (@IlyaKreymer) and Tessa Walsh (@bitarchivist) from Webrecorder (@webrecorder_io) had two presentations in Session 13. First, they presented "Developer Update for Browsertrix Crawler and Browsertrix Cloud". For Browsetrix Crawler, a docker image to run a single browser-based crawl, they have implemented more consistent logging in addition to more robust status codes that reflect page completeness within the logs. For Browsertrix Cloud, an integrated crawl management service that uses Browsertrix Crawler, they are working to support collection curation and replay.

Second, they presented "Sustaining pywb through Community Engagement and Renewal: Recent Roadmapping and Development as a Case Study in Open Source Web Archiving Tool Sustainability". With limitations on time and resources, they have been roadmapping and evaluating future directions for pywb and inviting input from users. What features do users use most? What features are users looking for? Are others willing to contribute and in what ways? In this presentation, they presented the results of their survey and invited additional input via their online form.

Matteo Cargnelutti (@macargnelutti) from Perma.cc (@permacc) presented "Opportunities and Challenges of Client-Side Playback", a more technical overview of the project described by colleague Clare Stanton in Session 6. Client-side replay does not simplify the complexity of replay, but it moves the complexity from one end of the Web (server) to the other end (browser). He explained the security challenges of using iframes and the patches they have implemented in WACZ-Exhibitor, a tool that allows safe 2-way communication between the embedded archive and the embedding page. Matteo also described some of the other tools in the toolkit Perma.cc has been developing.

Lastly, Ayush Goel (@goelayu_sh) from the University of Michigan presented "Addressing the Adverse Impacts of JavaScript on Web Archives". JavaScript execution results in different renderings of the same Web page through various sources of non-determinism including browser, OS, screen dimensions, and current time. He argued that it does not make sense to remove all non-determinism and presented JavaScript Aware Web Archiving (JAWA) as a solution. JAWA selectively removed non-determinism by eliminating non-determinism only if it influences the resources fetched.

Session #15: Data Considerations

To start Session 15, Emily Escamilla (@EmilyEscamilla_) from Old Dominion University's Web Science and Digital Libraries research group (@WebSciDL) presented "What if GitHub Disappeared Tomorrow?". Access to the original software used in a research experiment is crucial to reproducibility, a cornerstone of scientific research. Archived copies of software can be found in Zenodo, Software Heritage, and Internet Archive. She presented different ways to access software repositories archived in each of the digital libraries. However, if GitHub disappeared tomorrow, at least 15,000 scholarly repositories would be lost forever.

What if GitHub disappeared tomorrow? How can we use existing digital libraries to find repositories? What percentage of scholarly code repositories would disappear forever?

I presented the answers to these questions at #IIPCWAC23: https://t.co/Sbk6VTuTd0 #WebArchiveWednesday
— Emily Escamilla (@EmilyEscamilla_) May 17, 2023

Eld Zierau (@EldZierau) from the Royal Danish Library presented "Web Archives and FAIR Data: Exploring the Challenges for Research Data Management (RDM)", an overview of the WARCnet project. They presented the results of their semi-structured interviews on the Research Data Management (RDM) practices of those who engage in the Web Archiving Lifecycle (WAL). They specifically focused on FAIR principles (findable, accessible, interoperable, and reusable).

In Session 15, @EldZierau from the Royal Danish Library presented their work with WARCnet and research data management for Web archive studies

She referenced lots of great work from @WebSciDL @maturban1 @shawnmjones !#iipcWAC23 pic.twitter.com/7lzWxmMbin
— Emily Escamilla (@EmilyEscamilla_) May 12, 2023

Mark Phillips (@vphill) from the University of North Texas presented "Lessons Learned in Hosting the End of Term Web Archive in the Cloud". The End of Term Web Archive (@eotarchive) is to document the transition in the Executive Branch of the United States by archiving federal government Web pages before and after each election cycle. They have captured the 2008, 2012, 2016, and 2020 transitions with the help of multiple institutions include the University of North Texas and the Internet Archive. They recently moved the collections to Amazon S3 to allow for greater access and computational consumption of the collection.

To wrap up the session, @vphill from @UNT_Libraries presented their work with @ibnesayeed from @internetarchive and the @eotarchive

They recently moved the dataset to AWS3 to allow for greater access to the EOT datasets for reuse and research #iipcWAC23 pic.twitter.com/wqesUJ46Dr
— Emily Escamilla (@EmilyEscamilla_) May 12, 2023

Session #16: Preservation and Complex Digital Publications

Michael Kurzmeier (@mkrzmr) from University College Cork presented "Preservability and Preservation of Digital Scholarly Editions". He found that there is no universal solution to archiving Digital Scholarly Editions (DSEs), but existing approaches like Web archiving can be used for some purposes. Web archives are de facto important providers of DSE preservation.

Ian Cooke (@IanCooke13) and Giulia Carla Rossi (@giuliacrossi) from the British Library presented "Collecting and Presenting Complex Digital Publications". Complex digital publications are publications that are born-digital and are typically multi-modal with hardware, software, and/or operating system dependencies. In the collection, they are working to represent the diversity of publishing within the UK. They presented some of the challenges associated with such an undertaking including access to non-browser-based material, developing a rights and re-use framework for contextual information, and discovering and linking related materials.

Ian Cooke & Giulia Carla Rossi (@britishlibrary) gave an overview of 'Collecting & Presenting Complex Digital Publications'.

You can view some of the collection at the forthcoming #DigitalStorytelling exhibition https://t.co/EzUVK8kRWU #IIPCWAC23 #IIPC20Years #UKLegalDeposit pic.twitter.com/r51F3rg4QH
— UK Web Archive (@UKWebArchive) May 12, 2023

Next, Daniel Steinmeier and Susanne van den Eijkel (@SvandenEijkel) from KB Nationale Bibliotheek presented "What Can Web Archiving History Tell Us about Preservation Risks?" File format obsolescence is a problem for Web archiving. In migrating to a new format, archivists typically agree on significant properties with the producer. However, it can be difficult to identify significant properties when there is no clear producer and no way to know the original intent. They concluded by saying that, while obsolescence is a problem, completeness should be a preservation priority more urgent than solving obsolescence.

And again my colleagues are presenting. @SvandenEijkel and Daniel Steinmeier told about preservatieve risks #iipcWAC23 pic.twitter.com/Em5wKVmlAt
— Trienka Rohrbach (@trienka) May 12, 2023

Keynote

To close out the IIPC WAC 2023, Marleen Stikker (@marleenstikker) from WaagFuturelab presented her keynote "Public Values in the Digital Domain". She talked about the history of the Internet and the impact of capitalism and large companies on the Internet as a public commons. She left the audience with lots to ponder regarding how we interact with the Internet and how it is or is not governed.

@marleenstikker offering the #iipcwac23 closing keynote titled "Public values in the digital domain" pic.twitter.com/0hcyTLas53
— Martin Klein (@mart1nkle1n) May 12, 2023

“If you can’t open it, you don’t own it”

“The internet is free, therefore, the platform owns you”

Some thought provoking quotes from Marleen Stikker’s keynote “Public values in the digital domain” to close out #IIPCWAC23 pic.twitter.com/p3KBdDY0jg
— Emily Escamilla (@EmilyEscamilla_) May 12, 2023

Conclusion

IIPC WAC 2023 was the first time I was able to attend IIPC WAC in-person and I had the opportunity to present as part of a panel and with an individual presentation. I came away with a better understanding of the tools being developed in the Web archiving community and with some ideas on how to leverage them in the research I am doing. I also learned more about the limitations and challenges faced by various cultural heritage and archival institutions as well as the solutions they have implemented. Last year, IIPC WAC 2022, deepened my appreciation for the need for Web archiving and this year's conference, IIPC WAC 2023, grew my understanding of the innovative solutions our community is developing and left me excited to investigate them further on my own.

We have trip reports for some of the prior IIPC Web Archiving Conferences and IIPC General Assemblies: 2022, 2021, 2017, 2016, 2015, 2014, 2012, 2011.

Search This Blog

Web Science and Digital Libraries Research Group

2023-05-24: IIPC Web Archiving Conference (WAC) Trip Report

Day One

Keynote

Session #1: Research & Access

Session #3: Panel: Supporting Digital Scholarship

Session #6: Social Media & Playback: Collaborative Approaches

Session #7: Collaborations & Outreach

Session #10: Lightning & Drop-In Talks

Day Two

Workshop #4

Session #12: Domain Crawls

Session #13: Crawling, Playback, Sustainability

Session #15: Data Considerations

Session #16: Preservation and Complex Digital Publications

Keynote

Conclusion

Other IIPC WAC 2023 Blog Posts:

Comments

Post a Comment