2019-06-20: Web Archiving and Digital Libraries Workshop 2019 Trip Report
A subset of JCDL 2019 attendees assembled together on June 6, 2019, at the Illini Union for the Web Archiving and Digital Libraries workshop (WADL 2019). Like previous years, this year's workshop too was organized by Dr. Martin Klein, Dr. Zhiwu Xie, and Dr. Edward A. Fox. Martin inaugurated the session by welcoming everyone and introducing the schedule for the day. He observed that WADL 2019 had an equal representation from both males and females, which was not only the case with attendees, but also presenters. Web Science and Digital Libraries Research Group (WS-DL) from the Old Dominion University was represented there by Dr. Michele C. Weigle, Alexander C. Nwala, and Sawood Alam (me) with two accepted talks.
#WADL2019 begins with @mart1nkle1n introducing the workshop and the schedule for the day (and posing for the camera). pic.twitter.com/LNqFmY1Cp2— Sawood Alam (@ibnesayeed) June 6, 2019
Cathy Marshall from the Texas A&M University presented her keynote talk entitled, "In The Reading Room: what we can learn about web archiving from historical research". She told many fascinating stories and the process she went through to collect bits and pieces of those stories. Her talk shed light on many problems similar to what we see in web archiving. Her talk reminded me of her presentation at the IIPC General Assembly 2015 entitled, "Should we archive Facebook? Why the users are wrong and the NSA is right".
@ccmarshall gives the first talk of WADL (Web Archiving And Digital Libraries):— Alexander C. Nwala (@acnwala) June 6, 2019
In the Reading Room#WADL2019 #JCDL2019 pic.twitter.com/4ZfYQQHuM1
What’s really valuable here from Marshall is documenting the research process in a digital age – something that historians are really bad at doing. We just like take photos, use digital documents, etc. and spit out our final papers w/o enough reflection. #wadl2019 pic.twitter.com/p7DaRSO3zt— Ian Milligan (@ianmilligan1) June 6, 2019
Some closing thoughts from Marshall on copyright, ownership, and access. Sees some lessons for #webarchiving here. #wadl2019 pic.twitter.com/VxCKrxgw9c— Ian Milligan (@ianmilligan1) June 6, 2019
Such an interesting talk — @ccmarshall is a great storyteller. https://t.co/dN2uoMQmcy— Michele Weigle (@weiglemc) June 6, 2019
— Michele Weigle (@weiglemc) June 6, 2019
Corinna Breitinger from the University of Konstanz (but now moved to the University of Wuppertal) presented her team's work entitled, "Securing the integrity of time series data in open science projects using blockchain-based trusted timestamping". She discussed a service called OriginStamp that allows people to create a tamper-proof record of ownership of some digital data at the current time by creating a record in a blockchain. She mentioned Blockchain_Pi project that allows connecting a Raspberry Pi to blockchain for timestamping various sensor data. A remarkable achievement of their project was being cited by a German Supreme Court ruling on a Dashcam recording that was configured to trigger a timestamping call on a short clip when something unusual happens on the road.
@BreitingerC— Alexander C. Nwala (@acnwala) June 6, 2019
shows a hardware prototype for timestamping sensor readings (e.g., temperature, vibration) with @OriginStamp
to create a trusted timestamp. Also Dashcam video timestamping.
Github: https://t.co/VdehBKpSE0#wadl2019 #jcdl2019 pic.twitter.com/d3styuEKAX
Learning from @BreitingerC’s “Securing the integrity of time series data in open science projects using blockchain-based trusted timestamping.” Interesting way to generate sensor data using Raspberry Pi, IPFS, and Blockchain. Check it out at https://t.co/LJuIC5Gx8q #WADL2019— Ian Milligan (@ianmilligan1) June 6, 2019
#OriginStamp generated data should be stored in various distributed places, proposed by @BreitingerC #WADL2019 pic.twitter.com/VjtPPIiM7h— Martin Klein (@mart1nkle1n) June 6, 2019
And @BreitingerC closes off her presentation by noting that we can try this all out ourselves with the OriginStamp documentation at https://t.co/Pps63T7OAz. Cool! #wadl2019— Ian Milligan (@ianmilligan1) June 6, 2019
I, Sawood Alam, presented "Impact of HTTP Cookie Violations in Web Archives". This was a summary of two of our blog posts entitled "Cookies Are Why Your Archived Twitter Page Is Not in English" and "Cookie Violations Cause Archived Twitter Pages to Simultaneously Replay in Multiple Languages" in which we performed detailed investigation of two HTTP cookie related issues in web archives. We found that long-lasting cookies in web archives have undesired consequences in both crawling and replay.
For more on this Twitter cookie issue, see https://t.co/yr6BkUXHyb and https://t.co/pa7BBYUU35 #WADL2019— Michele Weigle (@weiglemc) June 6, 2019
— Martin Klein (@mart1nkle1n) June 6, 2019
.@ibnesayeed presenting on the “Impact of HTTP Cookie Violations in Web Archives,” noting some of the fun stuff they’ve run into around archived Twitter pages that you’d think should be in English but are actually in languages like Urdu. More at https://t.co/hUtersiO2d. #WADL2019— Ian Milligan (@ianmilligan1) June 6, 2019
@ibnesayeed presenting his research on the impact of HTTP cookie violations in #WebArchives at #WADL2019 pic.twitter.com/5DZdYC1Jot— Corinna Breitinger (@BreitingerC) June 6, 2019
Add cookies to the list of bad guys (along with JavaScript) for their dastardly antics in web archives. paraphrasing @ibnesayeed at #WADL2019— Jasmine Mulliken (@jasminemulliken) June 6, 2019
How many languages can we get into a Twitter page? Fun experiment by @ibnesayeed #WADL2019 pic.twitter.com/oCd1sbGLAB— Martin Klein (@mart1nkle1n) June 6, 2019
What can be done about this cookie problem for web archives? @ibnesayeed has some suggestions. #WADL2019 pic.twitter.com/TSt9d1Gbbf— Michele Weigle (@weiglemc) June 6, 2019
Ed Fox from Virginia Tech presented his team's work entitled, "Users, User Roles, and Topics in School Shooting Collections of Tweets". They attempted to identify patterns in user engagement on Twitter regarding school shootings. They also created a tool called TwiRole (source code) that classifies a Twitter handle as "Male", "Female", or a "Brand" using multiple techniques.
— Alexander C. Nwala (@acnwala) June 6, 2019
I am classified as "male" but mostly due to the image classifier #JCDL2019 #WADL2019 pic.twitter.com/E5S8BtKqsr— Martin Klein (@mart1nkle1n) June 6, 2019
Edward Fox presents the TwiRole tool at #WADL2019 Try it for yourself at: https://t.co/ho9JJf1p6F #jcdl2019 And find the research paper on ArXiv: https://t.co/5AFOCDQj9u pic.twitter.com/AiGGEztmLa— Corinna Breitinger (@BreitingerC) June 6, 2019
Ian Milligan from the University of Waterloo presented his talk entitled, "From Archiving to Analysis: Current Trends and Future Developments in Web Archive Use". He emphasized that the historians of the future writing history of post-1996 will need to understand the Web. Web archives will play a big role in writing the history of today. It is important that there are tools beyond Wayback Machine that they can use to interact with web archives and understand their holdings. He mentioned the Archives Unleashed Cloud as a step in that direction.
@ianmilligan1 is wrapping up the morning session with his insights on "From Archiving to Analysis: Current Trends and Future Developments in Web Archive Use" #WADL2019 pic.twitter.com/J5sTGwpAAz— Martin Klein (@mart1nkle1n) June 6, 2019
Can't wait to see what researchers will do with Archives Unleashed Cloud and Notebook tools! Analyzing web archives is getting much more accessible! @ianmilligan1 shows us the interface and discusses the challenges of meeting audiences' needs and literacies. #WADL2019— Jasmine Mulliken (@jasminemulliken) June 6, 2019
— Martin Klein (@mart1nkle1n) June 6, 2019
@ianmilligan1 discussing the significant challenge of catering to traditional #historians and #librarians while enabling advanced web archive analysis at #WADL2019 #tldr: historians of the future need to learn the web! pic.twitter.com/HmWFD7XgnA— Corinna Breitinger (@BreitingerC) June 6, 2019
Jasmine Mulliken from the Stanford University Press (SUP) presented her talk entitled, "Web Archive as Scholarly Communication". She described various SUP projects and related stories. She spent a fair amount of time describing the use of Webrecorder at SUP in projects like Enchanting the Desert. She also described that SUP is in peril and mentioned Save SUP site that documents the timeline of recent events threatening the existence of SUP. While talking about this, she played a clip from the finale of the Game of Thrones in which the dragon burns the Iron Throne.
@jasminemulliken kicking off the afternoon session with her thoughts on "Web Archive as Scholarly Communication" #WADL2019 pic.twitter.com/0eciBaVSQJ— Martin Klein (@mart1nkle1n) June 6, 2019
Why https://t.co/p7Hfkndraz? One major point that @jasminemulliken stresses is the “mutual benefit of experimentation through shared grant values.” Great for #webarchiving community. 😀 #wadl2019 pic.twitter.com/OcDjmRnHql— Ian Milligan (@ianmilligan1) June 6, 2019
@jasminemulliken: Web Archive as Scholarly Comm.@stanfordpress projects:— Alexander C. Nwala (@acnwala) June 6, 2019
-The Chinese Deathscape https://t.co/QeS06lyn6v
-Filming Revolution https://t.co/yXkU9X7ai0
-When Melodies Gather https://t.co/5IlvGqPtTv
-Enchanting the Desert https://t.co/GMOWvt5kHu#wadl2019 #jcdl2019 pic.twitter.com/6mZpnQSOML
@jasminemulliken spent about a week to archive the https://t.co/ooCDypVUBC project with @webrecorder_io tool #WADL2019 pic.twitter.com/PhWAqOTWRe— Martin Klein (@mart1nkle1n) June 6, 2019
— Martin Klein (@mart1nkle1n) June 6, 2019
Brenda Reyes Ayala from the University of Alberta presented her talk entitled, "Using Image Similarity Metrics to Measure Visual Quality in Web Archives" (slides). Automated quality assurance of archival collections is a topic of interest for many IIPC members. Brenda shared initial findings of her team using image similarities of captures with and without archival banners. She concluded that their result showed significant success in identifying poor and high quality captures, but there is a lot more that needs to be done to improve the quality assurance automation.
@CamtheWicked on "Using Image Similarity Metrics to Measure Visual Quality in Web Archives" #WADL2019 pic.twitter.com/7H94Fx8IvF— Martin Klein (@mart1nkle1n) June 6, 2019
My paper and presentation at #WADL2019 #jcdl2019 "Using Image Similarity Metrics to Measure Visual Quality in— Brenda Reyes (@CamtheWicked) June 11, 2019
Web Archives" is now available here: https://t.co/PAQfBy8agU #webarchiving
Sergej Wildemann from the L3S Research Center presented his talk entitled, "A Collaborative Named Entity Focused URI Collection to Explore Web Archives". He started his talk by describing that the temporal aspect of named entities is often neglected when indexing the live web. Temporal changes associated with an entity become more important when exploring an archival collection related to the entity. He mentioned Internet Archive's beta version of a new prototype of Wayback Machine released in 2016 that provided text search indexed based on the anchor text pointing to sites. Towards the end of his talk he showcased his tool called Tempurion that allows archived named entity search with temporal dimension attached for filtering search results based on date ranges.
Sergej Wildemann Up next on "A Collaborative Named Entity Focused URI Collection to Explore Web Archives" #WADL2019 pic.twitter.com/8jtv8r4l77— Martin Klein (@mart1nkle1n) June 6, 2019
I, Sawood Alam, presented my second talk (and the last talk of the day) entitled, "MementoMap: An Archive Profile Dissemination Framework". This talk was primarily based on our JCDL submission that was nominated for the best paper award, but in the WADL presentation we focused more on technical details, use cases, and possible extensions, instead of experimental results. We also talked about the Universal Key Value Store (UKVS) format with some examples.
@ibnesayeed: "Broadcasting [querying all Web Archives] is evil"— Alexander C. Nwala (@acnwala) June 6, 2019
MementoMap summarizes the holdings of Web Archives, so a client may intelligently route queries to a subset of Web Archives
Pre-print: https://t.co/Dc7EnGq5I7
Slides: https://t.co/OdM0TEtInb#wadl2019 #jcdl2019 pic.twitter.com/8oqIRDhqHb
Once all the formal presentation were over, we all started to discuss about post-workshop plans. The first matter we discussed was about making proceedings available online or as a special issue of a journal. In previous years (except the last year) WADL proceedings were published in the IEEE-TCDL Bulletin, which is discontinued now. Potential fallback options include: 1) compilation of all submissions with an introduction and publishing it as a single document to arXiv, but citing individual work would be an issue, 2) publishing on OSF Preprint, and 3) utilizing a GitHub Pages, with the added advantage of providing supplementary materiel such as slides. To enable more effective communication, a proposal was made to create a mailing list (e.g., using Google Groups) for the WADL community. It was proposed that posters should not be included in the call for papers, because the number of submissions are usually finite enough to give a presentation slot to everyone. Fun fact, only Corinna brought a poster this time. We discussed the possibility of more than one WADL events per year which may or may not be associated with a conference. Since the next JCDL event would be in China, people had some interest in having an independent WADL workshop in the US. Finally, we discussed the possibility of adding a day ahead of JCDL for a hackathon and a day after for the workshop where hackathon results can be discussed in addition to usual talks.
It was indeed a fun week of #JCDL2019 and #WADL2019 where we got to meet many familiar and new faces and explored the spacious campus of the University of Illinois. You may also want to read our detailed trip report of the JCDL 2019. We would like to thanks organizers and sponsors of both JCDL and WADL for making it happen. We would like to extend special thanks to Dr. Stephen Downie, without whom this event would not have been as organized and fun as it was. We would also like to thank NSF, AMF, and ODU for funding our travel expenses. Last but not the least, I would personally thank the "WADL DongleNet" which made it possible for me to connect my laptop with the projector twice.
The famous #WADL2019 dongle spider, aka donglenet. We connect EVERYTHING! pic.twitter.com/y6FGE13dzj— Jasmine Mulliken (@jasminemulliken) June 6, 2019
--
Sawood Alam
Comments
Post a Comment