Sunday, December 31, 2017

2017-12-31: Digital Blackness in the Archive - DocNow Symposium Trip Report

From December 11-12, 2017, I attended the second Documenting the Now Symposium in St. Louis, MO.  The meeting presentations were recorded and are available along with an annotated agenda; for further background about the Documenting the Now project and my involvement via the advisory board, I suggest my 2016 trip report, as well as DocNow activity on github, slack, and Twitter.  In addition, the meeting itself was extensively live-tweeted with #BlackDigArchive (see also the data set of Tweet ids collected by Bergis Jules).

The symposium began at the Ferguson Public Library, first with a welcome Vernon Mitchell of DocNow and Scott Bonner of the Ferguson Public Library.  This venue was chosen for its role in the events of Ferguson 2014 (ALA interview, CNN story).  The engaging opening keynote was by Marisa Parham of Amherst College, entitled "Sample, Signal, Strobe", and I urge you to take the time to watch it and not rely on my inevitably incomplete and inaccurate summary.  With those caveats, what I took away from Parham's talk can be summarized as addressing "the confluence of social media and the agency it gives some people" and "twitter as a dataset vs. twitter as an experience", and losing context of tweet, removes the "performance" part.  Watching hashtags emerge, watching the repetition of RTs, and the sense of contemporary community and shared experience (which she called "the chorus of again").  I can't remember if she made this analogy directly or if it is just what I put in my notes, but a movie in a theater is a different experience than at home even though home theaters can be quite high-fidelity, in part because of the shared, real-time experience.  To this point I also tweeted a link to our Katrina web archive slides because we find that replay of contemporary web pages makes a more powerful argument for web archives than, say, wikipedia or other summary pages.

Parham had a presentation online that provided some of the quotes that she used, but I did not catch the URI.  Here are some of the resources that I was able to track down while she talked (I'm sure I missed several):

Next up was the panel "The Ferguson Effect on Local Activism and Community Memory", and two of the panelists, Alexis Templeton and Kayla Reed, were repeat panelists from the 2016 meeting; and this brought up a point they made during their presentations: while archives document specific points in time, the people involved should be allowed to evolve and live their life without the expectations and weight of those moments.  There was a lot conveyed by the panelists and I feel I would be doing them a disservice to further summarize their life experiences. Instead, at the risk of interrupting the flow of the post, I will include more tweets than I would normally  from others and redirect you to the video for the full presentations and the pointed discussion that followed. 

After this panel, we adjourned to the local institution of Drake's Place for lunch, and in the evening saw a screening of "Whose Streets?" at WUSTL.

The next morning we resumed the meeting on the campus of WUSTL and began with tool/technology overviews then breakout demos from Ed Summers, Alexandra Dolan-Mescal, Justin Littman, and Francis Kayiwa.

I'm not sure how much longer will be up, but highly recommend that you interact with the service while you can and provide feedback (sample screen shots above).  The top screen shows trending hashtags for your geographic area, and the bottom screen shows the mutli-panel display for the hashtag: tweets, users, co-occurring hashtags, and embedded media.

The second panel, "Supporting Research: Digital Black Culture Archives for the Humanities and Social Sciences", began after the tool demo sessions.

Meridith Clark began with the observation about the day of Ferguson, "some of my colleagues will see this just as data."  Unfortunately, this panel does not appear to have been recorded.  Catherine Knight Steele made the point that while social media are "public spaces", like a church they still require respect.

Clark also solicited feedback from the panel about what tools and functionality they would like to see.  Melissa Brown talked about Instagram (with which our group has done almost nothing to date) and Picodash (with extended features like geographic bounding of searches).  Some one (not clear in my notes) also discussed the need to not just have, for example, the text in a blog, but also the entire contemporary UI maintained (this is clearly an application for web archiving, but social media is often not easy to archive).  Clark also discussed the need for more advanced visualization tools, and the panel ended with a discussion about IRBs and social media.

Unfortunately I had to leave for the airport right after lunch and had to miss the third panel, "Digital Blackness in the Archive: Collecting for the Culture".  Fortunately that panel was recorded and is linked from the symposium page

Another successful meeting, and I'm grateful to the organizers (Vernon Mitchell, Bergis Jules, Tim Cole).  The DocNow project is coming to an end in early 2018, and although I'm not sure what happens next I hope to continue my relationship with this team.


2017-12-31: ACM Workshop on Reproducibility in Publication

On December 7 and 8 I attend the ACM Workshop on Reproducibility in Publication in NYC as part of my role as a member of the ACM Publications Board and co-chair (with Alex Wade) of the Digital Library Committee.  The purpose of this workshop was to gather input from the various ACM SIGs about the approach to reproducibility and "artifacts", objects supplementary to the conventional publication process.  The workshop was attended by 50+ people, mostly from the ACM SIGs but also included representatives from other professional societies and repositories and hosting services.  A collection of the slides presented at the workshop and a summary report are being worked on now, and as such this trip report is mostly my personal perspectives on the workshop; I'll update with slides, summary, and other materials as they become available.

This was the third such workshop that had been held, but it was the first for me since I joined the Publications Board in September of 2017.  I have a copy of a draft report, entitled "Best Practices Guidelines for Data, Software, and Reproducibility in Publication" from the second workshop, but I don't believe that report is public so I won't share it here.

I believe it was from these earlier workshops where the ACM adopted their policy of including "artifacts" (i.e., software, data, videos, and other supporting materials) in the digital library.  At the time of this meeting the ACM DL had 600-700 artifacts.  To illustrate the ACM's approach to reproducibility and artifacts in the DL, below I show and example from ACM SIGMOD (each ACM SIG is implementing different approaches to reproducibility as appropriate within their community). 

The first image below is a paper from ACM SIGMOD 2016, "ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks", which has the DOI URI of  This page also links to the SIGMOD guidelines for reproducibility.

Included under the "Source Materials" tab is a link to a zip file of the software and a separate README file in unprocessed markdown format.  What this page doesn't link to is the software page in the ACM DL that also has a separate DOI,  The software DOI does link back to the SIGMOD paper, but not the SIGMOD paper does not appear to explicitly link to the software DOI (again, it links to just the zip and README). 

In that page I've also clicked on the "artifacts" button to produce a pop up that explains the various "badges" that the ACM provides; a full description is also available at a separate page.  More tellingly, on this page there is a link to the software as it exists in GitHub.

In slight contrast to the SIGMOD example, The Graphics Replicability Stamp Initiative (GRSI) embraces GitHub completely, with a combination of linking both to the repositories of the individuals (or groups) that wrote the code as well as linking to forks of the code within the GSRI account.  Of course, existing in GitHub is not the same as being archived (reminder: the fading of SourceForge and the closing of Google Code) and a DL has a long-term responsibility in hosting bits and not just linking to them (though to be fair, Git is bigger than GitHub and ACM could commit to  On the other hand, as GRSI implicitly acknowledge, decontextualizing the code from the community and functions that the hosting service (in this case, GitHub) provides is not a realistic short- or mid-term approach either.  Resolving the tension between memory organizations (like ACM) and non-archival hosting services (like GitHub) is one of the goals of the ODU/LANL AMF funded project ("To the Rescue of the Orphans of Scholarly Communication": slides, video, DSHR summary) and I hope to apply the lessons learned from the research project to the ACM DL.

One of the common themes was "who evaluates the artifacts?"  Initially, most artifacts are considered only for publications otherwise already accepted, and in most cases the evaluation is done non-anonymously by a different set of reviewers.  That adapts best to the current publishing process, but it is unresolved whether or not this is the ideal process -- if artifacts are to become true first class citizens in scholarly discourse (and thus the DL), perhaps they should be reviewed simultaneously with the paper submission.  Of course, the workload would be immense and anonymity (in both directions) would be difficult if not impossible.  Setting aside the issue of whether or not that it desirable, it would still represent a significant change to how most conferences and journals are administered.  Furthermore, while some SIGs have successfully implemented initial approaches to artifact evaluation with grad students and post-docs, it is not clear to me that this is scalable, and furthermore I'm not sure it sends the right message about the importance of the artifacts. 

Some other resources of note:
The discussion of identifiers, and especially DOIs, is of interest to me because one of the points I made in the meeting and continued on twitter can roughly be described as "DOIs have no magical properties".  No one actually claimed this, of course, but I did feel the discussion edging toward "just give it a DOI" (cf. getting DOIs for GitHub repositories).  I'm not against DOIs, rather the short version of my caution is that currently there is correlation between "archival properties" and "things we give DOIs to", but DOIs do not cause archival properties.

There was a fair amount of back channel discussion on Twitter with "#acmrepro"; I've captured the tweets during and immediately after the workshop in the Twitter moment embedded below.

I'll update this post as slides and the summary report become available.


Tuesday, December 19, 2017

2017-12-19: CNI Fall 2017 Trip Report

The Coalition for Networked Information (CNI) Fall 2017 Membership Meeting was held in Washington, DC on December 11-12, 2017. University Librarian George Fowler and I represented ODU, which was recognized as a new member this year.

CNI runs several parallel sessions of project briefings, so I will focus on those sessions that I was able to attend. The attendees were active on Twitter, using the hashtag #cni17f, and I'll embed some of the tweets below.  CNI has the full schedule (pdf) available and will have some of the talks on the CNI YouTube channel. (I'll note if any sessions I attended were scheduled to be recorded and add the link when published.) The project briefings page has additional information on each briefing and links to presentations that have been submitted.

Dale Askey (McMaster University) has published his CNI Fall 2017 Membership Meeting notes, which covers several of the sessions that I was unable to attend.

DAY 1 - December 11

Plenary "Resilience and Engagement in an Era of Uncertainty" - video

CNI Executive Director (and newly-named ACM Fellow) Clifford Lynch opened the Fall meeting with a plenary talk.

Cliff gave a wide-ranging talk that touched on several timely issues including the DataRefuge movement, net neutrality, generative adversarial networks, provenance, Memento, the Digital Preservation Statement of Shared Values, annotation, and blockchain.

Our recent work investigating the challenges of timestamping archived webpages (available as a tech report at arXiv) is relevant here, given Cliff's comments about DataRefuge, provenance, Memento, and blockchain.

Archival Collections, Open Linked Data, and Multi-modal Storytelling
Andrew White (Rensselaer Polytechnic Institute)

The focus was on taking campus historical archives and telling a story, with links between students, faculty, buildings, and other historical relationships on campus. They developed a system using the Unity game engine to power visualizations and the interactive environment. The system is currently displayed on 3 side-by-side monitors:
  1. Google map of the campus with building nodes overlaid
  2. Location / Character / Event timeline
  3. Images from the archives for the selected node
The goal was to take the photos and relationships from their archives and build a narrative that could be explored in this interactive environment.

Always Already Computational: Collections as Data - slides
Thomas Padilla (UNLV), Hannah Frost (Stanford), Laurie Allen (Univ of Pennsylvania)

Always Already Computational is an IMLS-funded project with the following goals:
  1. creation of a collections as data framework to support collection transformation
  2. development of computationally amenable collection use cases and personas
  3. functional requirements that support development of technological solutions
Much of their current work is focused on talking with libraries and researchers to determine what the needs are and how data can be distributed to researchers. The bottom line is how to make the university collections more useful. There was a lot of interest and interaction with the audience about how to use library collections and make them available for researchers.

Web Archiving Systems APIs (WASAPI) for Systems Interoperability and Collaborative Technical Development - slides
Jefferson Bailey (Internet Archive), Nicholas Taylor (Stanford)
Jefferson and Nicholas reported on WASAPI, an IMLS-funded project to facilitate the transfer of web archive data (WARCs) or derivative data from WARCs.

One of the motivations for the work was a survey finding that local web archive preservation is still uncommon. Only about 20% of institutions surveyed downloading their web archive data for preservation locally.

WASAPI's goal is to help foster and facilitate greater local data preservation and data transfer. There's currently an  Archive-It Data Transfer API that allows Archive-It partners to download WARCs and derivative data (WAT, CDX, etc.) from their Archive-It collections.

Creating Topical Collections: Web Archives vs. the Live Web
Martin Klein (Los Alamos National Laboratory)

Martin and colleagues looked at comparing creating topical collections from live web resources (URIs, twitter hashtags, etc) and creating topical collections from web archives. The work was inspired by Gossen et al.'s "Extracting Event-Centric Document Collections from Large-Scale Web Archives" (published in TPDL 2017, preprint available at arXiv) and uses WS-DL's Carbondate tool to help with extracting datetimes from webpages.

Through this investigation, they found:
  • Collections about recent events benefit more from the live web resources
  • Collections about events from the distant past benefit more archived resources
  • Collections about less recent events can still benefit from the live web and from the archived web 

Creating Topical Collections: Web Archives vs. Live Web from Martin Klein

DAY 2 - December 12

From First Seeds to Now: Researching, Building, and Piloting a Harvesting Tool
Ann Connolly, bepress

bepress has developed a harvesting tool for faculty publications in their Expert Gallery Suite and ran a pilot study to gain feedback from potential users. The tool harvests data from MS Academic, which has been shown to have journal coverage on par with Web of Science and Scopus. In addition MS Academic pulls in working papers, conference proceedings, patents, books, and book chapters. The harvesting tool allows university libraries to harvest metadata from published works of their faculty, including works published while the faculty member was at another institution.

Being unfamiliar with bepress, I didn't realize at first that this was essentially a product pitch. But I learned that this is the company behind Digital Commons, which powers ODU's Digital Commons, so I was at least a little familiar with the technology that was being discussed. 

bepress was recently acquired by Elseiver, and this was the topic of much discussion during CNI. The acquisition was addressed at a briefing "bepress and Elsevier: Let’s Go There", given by Jean-Gabriel Bankier, the Managing Director of bepress on Day 1.

Value of Preserving and Disseminating Student Research Through Institutional Repositories - slides
Adriana Popescu and Radu Popescu (Cal Poly)

This study investigated the impact of hosting student research in an institutional repository (IR) on faculty research impact (citations). They looked at faculty publications indexed in the Web of Science from six departments at Cal Poly and undergraduate senior projects from those same departments deposited in the university's Digital Commons. For their dataset, they found that the citation impact increased as the student project downloads increased. One surprising finding was that the correlation between faculty repository activity and research impact was weaker than the correlation between student repository activity and research impact. The work will be published in Evidence-Based Library and Information Practice.

Annotation and Publishing Standards Work at the W3C - recorded
Timothy Cole (Illinois - Urbana-Champaign)

Tim presented an overview of the W3C Recommendations for Web Annotation and highlighted a few implementations:
Tim also talked about web publications and the challenges in how they can be accommodated on the web.  "A web publication needs to operate on the web as a single resource, even as its components are also web resources."

Tim also gave a pitch for those interested to join a W3C Community Group and noted that membership in W3C is not required for participation there.

Beprexit: Rethinking Repository Services in a Changing Scholarly Communication Landscape - slides
Sarah WippermanLaurie Allen, Kenny Whitebloom (UPenn Libraries)

Since I had learned a bit about bepress earlier in the day, I decided to attend this session to hear thoughts from those using Digital Commons and other bepress tools.

The University of Pennsylvania has been using bepress since 2004, but with its acquisition by Elsevier, they are now exploring open source options for hosting Penn's IR, ScholarlyCommons.  Penn released a public statement on their decision to leave bepress.

The presenters gave an overview of researcher services provided by the library and an outline of how they are carefully considering their role and future options.  As they said, Penn is "leaving, but not rushing." They are documenting their exploration of open repository systems at

There was much interest from those representing other university libraries in the audience regarding joining Penn in this effort.

Paul Evan Peters Award & Lecture  - video

Scholarly Communication: Deconstruct and Decentralize?
Herbert Van de Sompel, Los Alamos National Laboratory

The final talk at the Fall 2017 CNI Meeting was the Paul Evans Peters Award Lecture.  This year's honoree was our friend and colleague, Herbert Van de Sompel. Herbert's slides and the video of the talk are embedded below.

Herbert discussed applying the principles of the decentralized web to scholarly communication. He proposed a Personal Scholarly Web Observatory that would automatically track the researcher's web activities, including created artifacts, in a variety of portals.

Herbert referenced several interesting projects that have inspired his thinking:
  • MIT's Solid Architecture - proposed set of conventions and tools for building decentralized social applications based on Linked Data principles
  • Sarven Capadisli's - a decentralised article authoring, annotation, and social notification tool
  • Amy Guy's "Personal Web Observatory" - tracks daily activities, categorized and arranged visually with icons
These ideas could be used to develop a "Researcher Pod", which could combine an artifact tracker, an Event Store, and a communication platform that could be run on an institutional hosting platform along with an institutional archiving process.  These pods could be mobile and persistent so that researchers moving from one institution to another could take their pods with them.

Paul Evan Peters Lecture from Herbert Van de Sompel

Final Thoughts 

I greatly enjoyed attending my first CNI membership meeting. The talks were all high-quality, and I learned a great deal about some of the issues facing libraries and other institutional repositories.  Once the videos are posted, I encourage everyone to watch Cliff Lynch's plenary and Herbert Van de Sompel's closing talk. Both were excellent.

Because of the parallel sessions, I wasn't able to attend all of the briefings that I was interested in. After seeing some of the discussion on Twitter, I was particularly disappointed to have missed "Facing Slavery, Memory, and Reconciliation: The Research Library’s Role and Georgetown University’s Experience" presented by K. Matthew Dames (Georgetown) and Melissa Levine (Michigan).

Finally, I want to thank and acknowledge our funders, NEH, IMLS, and the Mellon Foundation.  Program officers from these organizations gave talks at CNI:

2017-12-22 edit: Embedded and added link to Cliff's plenary talk.
2018-01-03 edit: Embedded and added link to Herbert's award lecture.

Thursday, December 14, 2017

2017-12-14: Storify Will Be Gone Soon, So How Do We Preserve The Stories?

Popular Storytelling service, Storify, will be shut down on May 16, 2018. Storify has been used by journalists and researchers to create stories about events and topics of interest. It has a wonderful interface, shown below, that allows one to insert text, but also add social cards and other content from a variety of services, including Twitter, Instagram, Facebook, YouTube, Getty Images, and of course regular HTTP URIs.
This screenshot displays the Storify editing Interface.
As shown below, Storify is used by news sources to build and publish stories about unfolding events, as seen below for the Boston NPR Station WBUR.
Storify is used by WBUR in Boston to convey news stories.
It is also the visualization platform used for summarizing Archive-It collections in the Dark and Stormy Archives (DSA) Framework, developed by WS-DL members Yasmin AlNoamany, Michele Weigle, and Michael Nelson. In a previous blog post, I covered why this visualization technique works and why many other tools fail to deliver it effectively. An example story produced by the the DSA is shown below.
This Storify story summarizes Archive-It Collection 2823 about a Russian plane crash on September 7, 2011.

Ian Milligan provides an excellent overview of the importance of Storify and the issues surrounding its use. Storify stories have been painstakingly curated and the aggregation of content is valuable in and of itself, so before Storify disappears, how do we save these stories?

Saving the Content from Storify


Storify does allow a user to save their own content, one story at a time. Once you've logged in, you can perform the following steps:
1. Click on My Stories
2. Select the story you wish to save
3. Choose the ellipsis menu from the upper right corner
4. Select Export
5. Choose the output format: HTML, XML, or JSON

Depending on your browser and its settings, the resulting content may display in your browser or a download dialog may appear. URIs for each file format do match a pattern. In our example story above, the slug for the story is 2823spst0s and our account name is ait_stories. The different formats for our example story reside at the following URIs.
  • JSON file format:
  • XML file format:
  • Static HTML file format:
If one already has the slugs and the account names, they can save any public story. Private stories, however, can only be saved by the owner of the story. What if we do not know the slugs of all of our stories? What if we want to save someone else's stories?

Using Storified From DocNow

For saving the HTML, XML, and JSON formats of Storify stories, Ed Summers, creator of twarc, has created the storified utility as part of the DocNow project. Using this utility, one can save public stories from any Storify account in the 3 available formats. I used the utility to save the stories from the DSA's own ait_stories account. After ensuring I had installed python and pip, I was able to install and use the utility as follows:
  1. git clone
  2. pip install requests
  3. cd storified
  4. python ./ ait_stories # replace ait_stories with the name of the account you wish to save
Update: Ed Summers mentions that one can now run pip install storified, replacing these steps. One only needs to then run ait_stories, again replacing ait_stories with the account name you wish to save.

Storified creates a directory with the given account name containing sub-directories named after each story's slug. For our Russia Plane crash example, I have the following:
~/storified/ait_stories/2823spst0s % ls -al
total 416
drwxr-xr-x   5 smj  staff    160 Dec 13 16:46 .
drwxr-xr-x  48 smj  staff   1536 Dec 13 16:47 ..
-rw-r--r--   1 smj  staff  58107 Dec 13 16:46 index.html
-rw-r--r--   1 smj  staff  48440 Dec 13 16:46 index.json
-rw-r--r--   1 smj  staff  98756 Dec 13 16:46 index.xml
I compared the content produced by the manual process above with the output from storified and there are slight differences in metadata between the authenticated manual export and the anonymous export generated by storified. Last seen dates and view counts are different in the JSON export, but there are no other differences. The XML and HTML exports of each process have small differences, such as <canEdit>false</canEdit> in the storified version versus <canEdit>true</canEdit> in the manual export. These small differences are likely due to the fact that I had to authenticate to manually export the story content whereas storified works anonymously. The content of the actual stories, however, is the same. I have created a GitHub gist showing the different exported content.

Update: Nick B pointed out that the JSON files — and only the JSON files — generated either by manual export or via the storified tool are incomplete. I have tested his assertion with our example story (2823spst0s) and can confirm that the JSON files only contain the first 19 social cards. To acquire the rest of the metadata about a story collection in JSON format, one must use the Storify API. The XML and static HTML outputs do contain data for all social cards and it is just the JSON export that appears to lack completeness. Good catch!

Using storified, I was able to extract and save our DSA content to Figshare for posterity. Figshare provides persistence as part of its work with the the Digital Preservation Network, and used CLOCKSS prior to March 2015.

That covers extracting the base story text and structured data, but what about the images and the rest of the experience? Can we use web archives instead?

Using Web Archiving on Storify Stories

Storify Stories are web resources, so how well can they be archived by web archives? Using our example Russia Plane Crash story, with a screenshot shown below, I submitted its URI to several web archiving services and then used the WS-DL memento damage application to compute the memento damage of the resulting memento.
A screenshot of our example Storify story, served from

A screenshot of our Storify story served from the Internet Archive, after submission via the Save Page Now Utility.
A screenshot of our Storify story served from

A screenshot of our Storify story served from

A screenshot of our Storify story served via WAIL version 1.2.0-beta3.
Platform Memento Damage Score Visual Inspection Comments
Original Page at Storify 0.002
  • All social cards complete
  • Views Widget works
  • Embed Widget works
  • Livefyre Comments widget is present
  • Interactive Share Widget contains all images
  • No visible pagination animation
Internet Archive with Save Page Now 0.053
  • Missing the last 5 social cards
  • Views Widget does not work
  • Embed Widget works
  • Livefyre Comments widget is missing
  • Interactive Share Widget contains all images
  • Pagination animation runs on click and terminates with errors 0.000
  • Missing the last 5 social cards
  • Views Widget does not work
  • Embed Widget does not work
  • Livefyre Comments widget is missing
  • Interactive Share Widget is missing
  • Pagination animation is replaced by "Next Page" which goes nowhere 0.051*
  • Missing the last 5 social cards, but can capture all with user interaction while recording
  • Views Widget works
  • Embed Widget works
  • Livefyre Comments widget is missing
  • Interactive Share Widget contains all images
  • No visible pagination animation
WAIL 0.025
  • All social cards complete
  • Views Widget works, but is missing downward arrow
  • Embed Widget is missing images, but otherwise works
  • Livefyre Comments widget is missing
  • Interactive Share Widget is missing images
  • Pagination animation runs and does not terminate

Out of these platforms, has the lowest memento damage score, but in this case the memento damage tool has been misled by how produces its content. Because takes a snapshot of the DOM at the time of capture and does not preserve the JavaScript on the page, it may score low on Memento Damage, but also has no functional interactive widgets and is also missing 5 social cards at the end of the page. The memento damage tool crashed while trying to provide a damage score for; its score has been extracted from logging information.

I visually evaluated each platform for the authenticity of its reproduction of the interactivity of the original page. I did not expect functions that relied on external resources to work, but I did expect menus to appear and images to be present when interacting with widgets. In this case, produces the most authentic reproduction, only missing the Livefyre comments widget. Storify stories, however, do not completely display the entire story at load time. Once a user scrolls down, JavaScript retrieves the additional content. will not acquire this additional paged content unless the user scrolls the page manually while recording.

WAIL, on the other hand, retrieved all of the social cards. Even though it failed to capture some of the interactive widgets, it did capture all social cards and, unlike, does not require any user interaction once seeds are inserted. On playback, however, it does still display the animated pagination widget as seen below, misleading the user to believe that more content is loading.
A zoomed in screenshot from WAIL's playback engine with the pagination animation outlined in a red box.

WAIL also has the capability of crawling the web resources linked to from the social cards themselves, making them suitable choices if linked content is more important than complete authentic reproduction.

The most value comes from the social cards and the text of the story, and not the interactive widgets. Rather than using the story URIs themselves, one can avoid the page load pagination problems by just archiving the static HTML version of the story mentioned above — use rather than I have tested the static HTML URIs in all tools and have discovered that all social cards were preserved.
The static HTML page version of the same story, missing interactive widgets, but containing all story content.

Unfortunately, other archived content probably did not link to the static HTML version. Because of this, if one were trying to browse a web archive's collection and followed a link intended to reach a Storify story, they would not see it, even though the static HTML version may have been archived. In other words, web archives would not know to canonicalize and


As with most preservation, the goal of the archivist needs to be clear before attempting to preserve Storify stories. Using the manual method or DocNow's storified, we can save the information needed to reconstruct the text of the social cards and other text of the story, but with missing images and interactive content. Aiming web archiving platforms at the Storify URIs, we can archive some of the interactive functionality of Storify, with some degree of success, but also with loss of story content due to automated pagination.

For the purposes of preserving the visualization that is the story, I recommend using a web archiving tool to archive the static HTML version, which will preserve the images and text as well as the visual flow of the story so necessary for successful storytelling. I also recommend performing a crawl to preserve not only the story, but the items linked from the social cards. Keep in mind that web pages likely link to the Storify story URI and not its static HTML URI, hampering discovery within large web archives.

Even though we can't save Storify the organization, we can save the content of Storify the web site.

-- Shawn M. Jones

Updated on 2017/12/14 at 3:30 PM EST with note about pip install storified thanks to Ed Summers' feedback.

Updated on 2017/12/15 at 11:20 PM EST with note about the JSON formatted export missing stories thanks to Nick B's feedback.