2016-06-27: Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What's on the Internet. The Symposium came at the end of the Archives Unleashed 2.0 Web Archive Datathon. The Datathon itself is covered in an earlier report. In addition to presenting the results of datathon projects, the wide variety of speakers at the symposium defended the need for preservation, discussed the special issues associated with preservation of data, and finally highlighted the importance and concepts surrounding the preservation of multimedia.

Keynote by Vint Cerf

The symposium opened with Dame Wendy Hall, Professor of Computer Science at the University of Southampton and Kluge Chair in Science and Technology at the Library of Congress. She mentioned that the Internet has only been available for a few decades and we need to preserve its openness and freedom, because that openness and freedom are always in peril. She used those points to introduce one of the inventors of TCP/IP, Vint Cerf.

Vint Cerf talked about the instability of the current Internet. He introduced the idea of "Digital Vellum", referring to vellum, a form of parchment created from animal skin that was once used to create fine quality documents. The goal is to not only capture the documents and data that make up the Internet, but also be able to recreate them in the distant future.

“Unfortunately I think preserving all of the digital information on vellum would require a lot of goats.” -Vint Cerf #SaveTheWeb
— Kate Zwaard (@kzwa) June 16, 2016

He highlighted a number of problems with the current Internet. URLs associated with domain names are not stable; a change in ownership or solvency of a company or organization can cause a domain name to stop responding.

Challenges to preserving the digital world: @vgcerf at #SaveTheWeb pic.twitter.com/5K6ql4LsVo
— (((jahendler))) (@jahendler) June 16, 2016

To understand the context surrounding them, digital objects also require a lot of metadata to be captured in addition to their original content. Users need enough information to correctly interpret the content that has been preserved because its context may be lost to time.

At @KlugeCtr #SaveTheWeb conference, @VGCerf: In 22nd Century we may know more about Lincoln Admin. than Obama Admin pic.twitter.com/a4buzzivJl
— Michael Nelson (@MikeNelson) June 16, 2016

Copyright law needs to be amended to give rights and protections to archival organizations, like the Library of Congress. Serious questions about protections for archival and replay of archived content still exist.

Copyright laws should be amended to give rights to preservation agencies like Library of Congress - per @vgcerf #SaveTheWeb
— Lee Rainie (@lrainie) June 16, 2016

He explained the importance of multiple web archives, noting historical issues with libraries and archives being lost to natural disasters or war. He thinks there is something "delicious" about the Library of Alexandria being a backup site for the Internet Archive. He said that we still have many of the artifacts of the ancient world purely by luck or accident, and that we can do better.

"If we build web archives, we should be building more than one" @vgcerf #savetheweb #DistributedWeb
— Ramine Tinati (@raminetinati) June 16, 2016

Vint Cerf: "We don't want preservation by accident" #SavetheWeb #hackarchives pic.twitter.com/HWThUavbQE
— Shawn M. Jones (@shawnmjones) June 16, 2016

Talking about other digital objects, Vint Cerf then discussed efforts to preserve software. He mentioned the OLIVE preservation project for archiving and replaying executable content. They are currently looking into streaming virtual machines in order to replay old executables on modern systems. He did confess that we have a long way to go before we're able to reproduce old results in some cases.

OLIVE project at Carnegie Mellon a good model for archiving digital content - https://t.co/1cMSRwCohe - Touted by @vgcerf #savetheweb
— Lee Rainie (@lrainie) June 16, 2016

Now Cerf on the OLIVE project, streaming VMs rather than downloading them – https://t.co/V87LTwvEOh. Discussing tech issues. #SaveTheWeb
— Ian Milligan (@ianmilligan1) June 16, 2016

Vint Cerf mentioned that idea of the "self-archiving Web", indicating that we need collaboration, open design, and new business models. He said that, due to its success, the Internet could serve as a good source of lessons for how one would go about designing the self-archiving Web. Participating in the current Internet is done by just following the agreed upon protocols. It works largely because of its modularity and its capacity for layered evolution. The self-archiving Web should also try to embrace these strengths.

We need a "self-archiving Web" -- @vgcerf backs idea from @timberners_lee. Need collaboration, open design, new biz models #savetheweb
— Lee Rainie (@lrainie) June 16, 2016

He listed outstanding questions with the approach of archiving the Internet. He said that he had some issues with contemplating the idea of the Internet containing itself. Is there were a better replacement for hyperlinks due to their deterioration? Should be be using something like DOIs instead? When should a snapshot be taken? How do we know when a change has occurred in a resource? How do we ensure that old formats, like old versions of HTML, will render well for future users? Do we store malware or encryption keys? How to handle access control for resources?

#SaveTheWeb it's not an easy task - vint cerf --> web is temporal, so how do you make it permanent? pic.twitter.com/jQdos1ggma
— Matthew Weber (@docmattweber) June 16, 2016

Vint Cerf: At what rate should we snapshot the WWW? Do we archive malware for historical purposes? #SaveTheWeb pic.twitter.com/dlwp5HRi9g
— Shawn M. Jones (@shawnmjones) June 16, 2016

@vgcerf: "I'd like to have a clue that something is worth snapshotting rather than taking a million pictures" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

@vgcerf: What about encrypted content? Do we store the keys? #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

The other inventor of TCP/IP, Bob Kahn of the Corporation for National Research Initiatives (CNRI), was unable to attend. He was scheduled to present Digital Object Architecture (DOA), so Vint Cerf continued by presenting that work as well. He talked about the existing handle systems, such as DOI, where identifiers, rather than locations, are used for digital objects. These identifiers are then submitted to a resolution system that locates the object and then delivers it to the user. Of course, this resolution system is an additional layer of infrastructure that must be managed. There are quite a few handle system implementations, including those supported by the Library of Congress, CrossRef, and mEDRA.

At @KlugeCtr #SaveTheWeb conf., @VGCerf pitch hits for Bob Kahn @cnri and explains Digital Object Architecture, DOA pic.twitter.com/mhFWqKHdJK
— Michael Nelson (@MikeNelson) June 16, 2016

He finished up by accepting a few questions from the audience. From these exchanges came additional insight - and funny moments - shown in the tweets below.

“I never let anyone read my code, because then they can see into my brain.” -@vgcerf #saveTheWeb
— Kate Zwaard (@kzwa) June 16, 2016

Web archiving needs to be "bottom up," distributed, and motivated in new ways - @vgcerf #SaveTheWeb
— (((jahendler))) (@jahendler) June 16, 2016

.@vgcerf at #SaveTheWeb: Once you’ve published on the world wide web, you’ve committed yourself to history.
— Justin Littman (@justin_littman) June 16, 2016

Vint Cerf: build preservation into the norm rather than as the action of a few parties #SaveTheWeb pic.twitter.com/ux5BW4qMb4
— Shawn M. Jones (@shawnmjones) June 16, 2016

Archives Unleashed Presentation

Ian Milligan is an Assistant Professor in the Department of History at the University of Waterloo. He discussed the use of web archives in studying history, highlighting how he used warcbase in a study of 186 million archived pages from geocities.com. He spoke about the importance in studying online communities in order to understand a period of history.

. @ianmilligan1 is discussing his study of 186 million URIs on https://t.co/bafMuJDF9u #SaveTheWeb #hackarchives pic.twitter.com/3qvOh4BRC1
— Shawn M. Jones (@shawnmjones) June 16, 2016

@ianmilligan1 presenting Geocities as a use case for the usefulness of web archiving for historians. Thanks @textfiles #SavetheWeb
— Leslie Johnston (@lljohnston) June 16, 2016

@ianmilligan1 Telling about the value of geocities web archiving. Thanks to the efforts of Archiveteam #SaveTheWeb
— Todd Suomela (@tsuomela) June 16, 2016

@ianmilligan1: https://t.co/vZGBbBNLOl had 7 million users by the end #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Then Ian and some team representatives presented our hard work from the Archives Unleashed Hackathon. We had worked on a variety of projects with different datasets. A lot of natural language processing combined with temporal metadata and modeling allowed our groups to study sentiment in elections, uncover differences in media reporting based on country, discovering documents related to terrorism, and more.

Team The Mojitos worked on understanding how Obama's visit to Cuba was reported in Cuban media #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Hid: studied the UK Elections 2015 on twitter, limited to MPs tweeting #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Thanks for sharing the insights from #hackarchives projects at #savetheweb - here are some summaries pic.twitter.com/0b0kArH4eH
— Katrin Weller (@kwelle) June 16, 2016

And more #hackarchives presentations #savetheweb pic.twitter.com/JEWfni7iQU
— Katrin Weller (@kwelle) June 16, 2016

CounterTerrorism: used an ideology classifyer to classify text from web radio transcripts #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

So glad these will be online! Three minutes is really just a teaser! Really interesting projects! #SaveTheWeb https://t.co/YVF8KV6lkm
— Stephanie A Kingsley (@KingsleySteph) June 16, 2016

“Keep calm and hack and yack” -@ianmilligan1 #SaveTheWeb pic.twitter.com/FVUSh13521
— Kate Zwaard (@kzwa) June 16, 2016

VintCerf is impressed with what we have done at #hackarchives: Is this a completely new way of understanding our own knowledge? #SaveTheWeb
— Shawn M. Jones (@shawnmjones) June 16, 2016

The Need for Preservation

Next was a series of presentations and panel discussion moderated by David Lazer (Northeastern University) with Abbie Grotke (Library of Congress), Jefferson Bailey (Internet Archive), Richard Marciano (University of Maryland), and Richard Price (British Library). Their topic was "The Need for Preservation". David Lazer started by discussing the curation of archives and posed the open question of how to determine which archived pages are valuable.

"The need for preservation" #SaveTheWeb now starting @librarycongress @KlugeCtr - webcast at https://t.co/65RliQfuMU pic.twitter.com/VOT9DLJHC3
— NEH Pres Access (@NEH_PresAccess) June 16, 2016

David Lazer

David Lazer is a Professor in Political Science and Computer and Information Science at Northeastern University. He started the session by discussing the quality of what has been archived. He mentioned that digital media allows us to think of documents and data in a different way. He discussed the issues with finding useful information in Twitter data, due to the presence of bots and other sources of noise.

With digital media we can see differently and imagine differently - @davidlazer @librarycongress #archivesunleashed pic.twitter.com/lBxd7Z3ZPe
— Ericka (@erickaakcire) June 16, 2016

.@davidlazer: points to the twitter-bots, who have definitely found the following hashtag: #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@davidlazer at #SaveTheWeb: As political scientist, inconceivable that will be able to write a history of election of 2016 w/out the web.
— Justin Littman (@justin_littman) June 16, 2016

Twitter mostly garbage & bots. How do we build tools that separate wheat/chaff? Possible today, maybe not in future. @davidlazer #SaveTheWeb
— Justin Littman (@justin_littman) June 16, 2016

Jefferson Bailey

Jefferson Bailey is the Director of Web Archiving Programs for the Archive-It Team at the Internet Archive. He highlighted some statistics about its current holdings as well as talking about its multimodal crawling strategy including work with libraries. At the moment, researchers must develop problem-specific tools to work with the Internet Archive. They are currently gathering information on research interests in an attempt to create a set of general purpose tools for research.

Jefferson Bailey of the Internet Archive says their Web Archive is about to hit 50 billion Web captures #SaveTheWeb
— John W. Kluge Center (@KlugeCtr) June 16, 2016

20yr anniversary of Internet Archive this yr, Jefferson Bailey #SaveTheWeb
— Lizzy Williamson (@earlymodernpost) June 16, 2016

The web: unprecedented confluence of ease-of-publication & ease-of-archival acquisition @jefferson_bail #SaveTheWeb pic.twitter.com/yxrWQgjZGW
— Shawn M. Jones (@shawnmjones) June 16, 2016

.@jefferson_bail "we're in a unique moment where you can both make a blog and archive it within seconds" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@jefferson_bail really underscoring all the different capture methods that IA uses. #SaveTheWeb pic.twitter.com/Z4mSNnUrvb
— Ian Milligan (@ianmilligan1) June 16, 2016

Richard Marciano

Richard Marciano leads the Digital Curation Innovation Center (DCIC) at the University of Maryland. He spoke about DCIC's work with big data and how it was related to digital archives. He finished up with some thought-provoking questions shown below.

Not a casual question, @marcianoRichard asks "Who is going to pay for this preservation?" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Dr. Richard Marciano questions for us to think about, including “How do we leverage the preserved web” #saveTheWeb pic.twitter.com/fiRAqtPIRc
— Kate Zwaard (@kzwa) June 16, 2016

Richard Marciano's questions to the audience #savetheweb pic.twitter.com/6FrAuDSMX5
— Katrin Weller (@kwelle) June 16, 2016

Richard Price

Richard Price is the Head of Contemporary British Collections at the British Library. He discussed the mission of saving the web, and stressed that advocacy has always been important for libraries and archives. He mentioned that users are often the best advocates and that the right language is best when trying to advocate for web archiving, preferring the term "time travel" because it seems to engender more interest from the public.

Now @InfoPrice is up, providing an overview of legal deposit - and now the mission of saving the Web. #saveTheWeb pic.twitter.com/2h0FfOmvVb
— Ian Milligan (@ianmilligan1) June 16, 2016

Price: One of the fantastic myths about the digital is that it is somehow free, and we internalize this #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

"Really really important that we behave non-virtually" -- meet in REAL places, says UK Librarian Richard Price #SaveTheWeb
— Lee Rainie (@lrainie) June 16, 2016

@InfoPrice: "Be poetic" in advocacy for digital pres., phrases like "time travel" resonate with the public #SaveTheWeb
— Jill Reilly James (@jillreillyjames) June 16, 2016

Abbie Grotke

Abbie Grotke is the web archiving team lead for the Library of Congress. She discussed the curated web archive maintained by the Library of Congress. They perform regular crawls of specific websites and use RSS feeds to inform their crawling. Currently the team is focused on acquiring web content, but they do not yet have the resources to make it all accessible. She said that there are challenges to archiving the web in the United States, because most sites do not sit under a country-specific top level domain.

“LC has a curated, selected approach to collecting web archives.” -@agrotke #saveTheWeb pic.twitter.com/jKkVO22GiH
— Kate Zwaard (@kzwa) June 16, 2016

.@agrotke discusses web archiving @librarycongress at #SaveTheWeb https://t.co/1rGAxAaRPj
— Jesse Johnston (@jesseajohnston) June 16, 2016

LOC web archivists are very envious of Iceland's well-defined (and smaller) preservation mandate #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@agrotke "The great news is I no longer have to explain what Web Archiving is to my colleagues" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Need to balance technical and human resources to manage the massive data from web archives #SaveTheWeb https://t.co/CUwrFExbQu
— Todd Suomela (@tsuomela) June 16, 2016

.@agrotke "The great news is I no longer have to explain what Web Archiving is to my colleagues" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

The question and answer session afterwards brought up a number of good thoughts. What are the ethics of archiving? Many archives have a national focus, but many topics are international; how do we curate topics so that they are available across archives? Do people have a right to be forgotten?

Putting Data to Work

The next session was moderated by Dame Wendy Hall. The speakers for this session were Lee Rainie (Pew Research Center), Katy Borner (Indiana University Bloomington), James Hendler (Rensselaer Polytechnic Institute), and Phillip E. Scheur (Stanford University).

Lee Rainie

Lee Rainie is the director of Internet, Science and Technology for the Pew Research Center. He stated that he was happy that so many large scale projects involving Internet data, and especially archived data, have a civic focus. He bemoaned the decline of civic news provided by newspapers, but said that librarians and archivists can play an important role in ensuring that civic information gets archived in web archives. He did warn that, though so many research projects acquire data from Twitter, only 20% of Americans use twitter, meaning that many perspectives are lost.

.@lrainie: 2. There's an information market that's degrading, the civic information market is crumbling #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@lrainie: 3. There's a movement that's poised to help you in the archive process. It's the Open Data movement. #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

@lrainie The Internet has blown up the bundle of local news - what is often missing now is comprehensive local civic coverage. #SavetheWeb
— Leslie Johnston (@lljohnston) June 16, 2016

.@lrainie public librarians are poised to help and deserve support. Librarians can watch the watchers #SavetheWeb
— Karen O'D (@websitemgmt) June 16, 2016

vital point from @lrainie that #socialmedia doesn't represent all voices, citing @pewinternet research https://t.co/kZU5IFthEI #SaveTheWeb
— Nicholas Taylor (@nullhandle) June 16, 2016

Katy Börner

Katy Börner is a Distinguished Professor of Information Science at the School of Informatics and Computing at Indiana University Bloomington. She discussed the exciting world of visualizing (web) science. She featured some of the work at scimaps.org, a site dedicated to visualizations of scientific data. I was surprised to see her highlight the "Clickstream Map of Science" that was "near and dear" to her, with which I was very familiar because it was created by Johan Bollen, Herbert Van de Sompel, and others as part of "Clickstream Data Yields High-Resolution Maps of Science". She mentioned the need to not only create tools for visualizing web data, but also the importance of pursuing information literacy so that many can use these tools as well.

. @katycns presents Visualizing (Web) Science https://t.co/Vla2fIaata #SaveTheWeb pic.twitter.com/KiUXj2AcdK
— Shawn M. Jones (@shawnmjones) June 16, 2016

@katycns showing lots of impressive maps #savetheweb pic.twitter.com/pBWGYAm4Ua
— Katrin Weller (@kwelle) June 16, 2016

.@katycns creating web tools to teach data visualization, tools and tasks #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@katycns Information literacy includes a broad range of skills, tools, workflows #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

James Hendler

James Hendler is Director of the Institute for Data Exploration and Applications and the Tetherless World Professor of Computer, Web and Cognitive Sciences at Rensselaer Polytechnic Institute (RPI). He is one of the originators of the Semantic Web. He discussed data and how important it is to ensure that the data we use for research is suitable for others to consume as well. He mentioned the importance of metadata for making sense of data in context, echoing earlier points made by Vint Cerf. He talked about the temporal nature of data and how accessing datasets at different points in time is in itself useful. I spoke to him during one of the breaks about work the LANL Prototyping Team has been doing in regards to temporal access to semantic web data.

Putting Data to Work panel @jahendler talking about semantics and data @katycns @lrainie @PhilipSchreur #savetheweb pic.twitter.com/kopAODMVVn
— Wendy Hall (@DameWendyDBE) June 16, 2016

Hendler: For the web itself, the URL system provides some kind of organizational structure #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Hendler: Anything we're going to do about web data can't be in a taxonomy, it can't be a tree #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

.@jahendler keys to data description
1.Web over trees
2.Metadata
3.Linking across ways of thinking
4.Temporal and changing model#SaveTheWeb
— Joe Carrano (@joecar25) June 16, 2016

.@jahendler - If we don’t use human concepts to organize the data, then humans won’t be able to understand the data. #SaveTheWeb
— Stephanie A Kingsley (@KingsleySteph) June 16, 2016

Philip Schreur

Philip Schreur is the Assistant University Librarian for Technical and Access Services at Stanford University Library. He discussed the issues of metadata and how libraries are engaged in a migration to linked data. He mentioned the importance of metadata in understanding historical context. He said that shifting from MARC and other metadata formats will be difficult, but necessary for the future of libraries. He sees a future where libraries will be creating metadata for the purposes of sharing it with the web. He also agrees that libraries will continue to curate data, but acquisition of content will be automated.

Philip E. Schreur mentioned the importance of transitioning from MARC to linked data #SaveTheWeb
— Shawn M. Jones (@shawnmjones) June 16, 2016

"Metadata is data with an ulterior motive." - Philip Schruer #SaveTheWeb
— Jaimie Murdock (@JaimieMurdock) June 16, 2016

.@PhilipSchreur “The key to putting the data to work is discovering the data in the first place.” #SaveTheWeb
— Stephanie A Kingsley (@KingsleySteph) June 16, 2016

.@PhilipSchreur: More efficient to put library data on the web than to bring the web's data into the library. Linked data key. #SaveTheWeb
— Justin Littman (@justin_littman) June 16, 2016

Dame Wendy Hall

Dame Wendy Hall then began talking about where she would take libraries, emphasizing that it is data that patrons are looking for. That data may take the form of documents, datasets, etc., but is more than just articles. She mentioned that librarians need to become more data-savvy and that discovery will become more and more important.

.@DameWendyDBE: Invest in data science training for librarians. In future, libraries will be data warehouses. #SaveTheWeb
— Justin Littman (@justin_littman) June 16, 2016

.@DameWendyDBE in the future, libraries will be data warehouses. Train your staff in data science now! #SaveTheWeb
— Lizzy Williamson (@earlymodernpost) June 16, 2016

.@DameWendyDBE discovery and navigation is going to be the problem, not storage capacity. Libraries at front line. #SaveTheWeb
— Lizzy Williamson (@earlymodernpost) June 16, 2016

.@DameWendyDBE "Making the haystack bigger will only complicate finding the needle" #SaveTheWeb
— Meaghan Brown (@EpistolaryBrown) June 16, 2016

Saving Media

The last session was moderated by Matthew Weber (Rutgers University). This session included Philip Napoli (Rutgers University), Ramesh Jain (University of California, Irvine), and Katrin Weller (GESIS Leibniz Institute for the Social Sciences and former Kluge Fellow in Digital Studies).

Matthew Weber

Matthew Weber is an Assistant Professor in the School of Communication and Information at Rutgers University. He began the session by talking about how web content changes and how it is possible to view the perceptions of a group in a specific point in time because of these changes.

. @docmattweber discusses changes in web content and how archives help us accurately understand events #SaveTheWeb pic.twitter.com/Z2WGcVIjUJ
— Shawn M. Jones (@shawnmjones) June 16, 2016

Also shows value of textual studies & adds complexity to the problem of how often to snapshot content. #SaveTheWeb https://t.co/2PoZhAlOAX
— Stephanie A Kingsley (@KingsleySteph) June 16, 2016

.@docmattweber kicks off panel on media: recounting story of incredulous child around print newspapers! #SaveTheWeb pic.twitter.com/vapTRA5QsJ
— Ian Milligan (@ianmilligan1) June 16, 2016

Philip Napoli

Philip Napoli is a Professor of Journalism and Media Studies in Rutgers School of Communication and Information. He began by echoing one of Vint Cerf's points: there is so much diverse content that it is more difficult to do a study in the early 2000s than it is to study media from the past. He mentioned that there needs to be focus on archiving local news because it is getting lost. It is also an area that local libraries can participate in.

. @pmnapoli: it's easier to do a study of news coverage from 1940 than it is today; due to the complexity of the media ecosystem #SaveTheWeb
— Shawn M. Jones (@shawnmjones) June 16, 2016

Philip Napoli: “It’s still easier to do a study of news coverage from literally 1940 than it is from a year ago.” 😫 Seriously. #SaveTheWeb
— Ian Milligan (@ianmilligan1) June 16, 2016

Great rundown by @pmnapoli of frustrations of studying local media ecosystems #SaveTheWeb https://t.co/x6HpeQ3UQu
— Lee Rainie (@lrainie) June 16, 2016

Ramesh Jain

Ramesh Jain is a Professor at the Bren School of Information and Computer Sciences at the University of California, Irvine. His is area of research includes multimedia information systems. He spoke about multimedia and how the growth of cameras have created an unprecedented capability for capturing events. He mentioned how a change is occurring, in part thanks to social media, whereby now we are producing "visual documents" that contain text rather than textual documents that begrudgingly contain photos. He emphasized that we have begun not just creating a web of documents, but a "web of events".

. @jain49: this century is very different from the last; showing the growth of camera use #SaveTheWeb pic.twitter.com/SWJfXRoNPd
— Shawn M. Jones (@shawnmjones) June 16, 2016

Killer slide from Ramesh Jain. Very true.. #SaveTheWeb pic.twitter.com/JxyTlQdKOe
— Ian Milligan (@ianmilligan1) June 16, 2016

. @jain49: creating not just a web of documents, but a web of events #SaveTheWeb pic.twitter.com/blizAu6opU
— Shawn M. Jones (@shawnmjones) June 16, 2016

A digital photo is a micro-report - a picture, datestamp, geolocation - almost enough to fully contextualize. #SavetheWeb
— Leslie Johnston (@lljohnston) June 16, 2016

Katrin Weller

Katrin Weller is an information scientist working at the GESIS Leibniz Institute for the Social Sciences. She discussed the issue of context in social media. Will present hashtags have any meaning in the future? She mentioned that future historians may use past instructional texts, like "Twitter for Dummies", to understand how our current tools are used. In some cases, it is important to understand that people change social media accounts over time.

Agree with @kwelle’s suggestion that people might use “Twitter for Dummies” .. I’ve used “Yahoo! for Dummies” w/r/t GeoCities. #SaveTheWeb
— Ian Milligan (@ianmilligan1) June 16, 2016

.@kwelle: Will we remember what hashtags are? What a particular hashtag meant? At different points in time? #SaveTheWeb
— Justin Littman (@justin_littman) June 16, 2016

More about @kwelle research at the Kluge Center can be found on @TIMEHistory: https://t.co/6NkTmwNTIt #SaveTheWeb
— John W. Kluge Center (@KlugeCtr) June 16, 2016

@bmhirsch @kwelle I’m struck how often my nieces change instagram & snapchat handles - future nightmare content creators! #savetheweb
— Abbie Grotke (@agrotke) June 16, 2016

Conclusion by Dame Wendy Hall

Dame Wendy Hall concluded the symposium by discussing the growth of the Internet and how it has changed the world. Her group at the University of Southampton hosts the Web Science Trust, with the goal of facilitating the development of Web Science. She explained that while libraries will be maintaining physical collections, data has also become important to researchers, requiring librarians to learn new data science skills. This led her to introduce Web Observatory, a place to share and link datasets so that researchers can answer questions about the web. The goal is to have metadata in a standard format that will support discovery, but also allow libraries to share each others' data rather than having to collect all of it themselves.

.@DameWendyDBE begins concluding remarks showing growth to show how novel #SaveTheWeb is. https://t.co/1gAo2GM9Ve pic.twitter.com/E1bZ6D9b7T
— Jaimie Murdock (@JaimieMurdock) June 16, 2016

. @DameWendyDBE mentioned @websciencetrust: https://t.co/aWuO08IMz2 #SaveTheWeb
— Shawn M. Jones (@shawnmjones) June 16, 2016

.@DameWendyDBE How much of the library is just the work on the left? How do we transition to the right? #SaveTheWeb pic.twitter.com/Aw3sp5DXVG
— Shawn M. Jones (@shawnmjones) June 16, 2016

@DameWendyDBE: Never expected libraries to collect every letter ever written in America. Why do we expect to collect entire web #SavetheWeb
— Leslie Johnston (@lljohnston) June 16, 2016

@DameWendyDBE introducing web observatory search #savetheweb pic.twitter.com/4dTL15ogVQ
— Katrin Weller (@kwelle) June 16, 2016

Thoughts and Thanks

All in all, this was an excellent experience and I am glad I attended. I was able to make contact with some of the best minds from a variety of fields while learning about their really fascinating work.

Thanks to Vint Cerf, Ian Milligan, David Lazer, Abbie Grotke, Jefferson Bailey, Richard Marciano, Richard Price, Lee Rainie, Katy Borner, James Hendler, Philip E. Scheur, Dame Wendy Hall, Matthew Weber, Philip Napoli, Ramesh Jain, and Katrin Weller for the excellent thought-provoking presentations.

Thanks to Matthew Weber, Ian Milligan, Jimmy Lin, Noshir Contractor, David Lazer, Wendy Hall, Nicholas Taylor, and Jefferson Bailey for making Archives Unleashed a reality and connecting it to the Save the Web Symposium. Also, thanks to all of the Archives Unleashed attendees who made the experience quite rewarding.

And final thanks go to Dame Wendy Hall and the John W. Kluge Center at the Library of Congress for hosting the event.

Thanks much for tweets from @DameWendyDBE, @EpistolaryBrown, @joecar25, @kwelle, @nullhandle, @websitemgmt, @lljohnston, @tsuomela, @jesseajohnston, @kzwa, @jillreillyjames, @lrainie, @ianmilligan1, @KlugeCtr, @NEH_PresAccess, @KingsleySteph, @justin_littman, @jahendler, @MikeNelson, @docmattweber, @raiminetinati, @earlymodernpost

Many others have also written articles about this event, including:

-- Shawn M. Jones

Search This Blog

Web Science and Digital Libraries Research Group