Posts

Showing posts with the label Web Archiving

2018-03-15: Paywalls in the Internet Archive

Image
Paywall page from The Advertister Paywalls  have become increasingly notable in the Internet Archive over the past few years. In our recent investigation into news similarity for U.S. news outlets, we chose from a list of websites and then pulled the top stories. We did not initially include subscriber based sites, such as The Financial Times  or Wall Street Journal , because these sites only provided snippets of an article, and then users would be confronted with a "Subscribe Now" sign to view the remaining content. The New York Times , as well as other news sites, also have subscriber based content but access is only limited once a user has exceeded a set number of stories seen. In our study of 30 days of news sites, we found 24 URIs that were deemed to be paywalls, and these are listed below: Memento Responses All of these URIs point to the Internet Archive but result in an HTTP status code of 404. We took all of these URI-Ms from the homepage of their respect

2018-03-14: Twitter Follower Count History via the Internet Archive

Image
The USA Gymnastics team shows significant growth during the years the Olympics are held. Due to Twitter's API, we have limited ability to collect historical data for a user's followers. The information for when one account starts following another is unavailable. Tracking the popularity of an account and how it grows cannot be done without that information. Another pitfall is when an account is deleted, Twitter does not provide data about the account after the deletion date. It is as if the account never existed. However, this information can be gathered from the Internet Archive . If the account is popular enough to be archived, then a follower count for a specific date can be collected.  The previous method to determine followers over time is to plot the users in the order the API returns them against their join dates. This works on the assumption that the Twitter API returns followers in the order they started following the account being observed. The creation

2017-12-31: Digital Blackness in the Archive - DocNow Symposium Trip Report

Image
Digital Blackness in the Archive was such a beautiful event. Thank you. #BlackDigArchive pic.twitter.com/SpoWYWDOhL — DocumentingTheNow (@documentnow) December 15, 2017 From December 11-12, 2017, I attended the second Documenting the Now Symposium in St. Louis, MO.  The meeting presentations were recorded and are available along with an annotated agenda ; for further background about the Documenting the Now project and my involvement via the advisory board, I suggest my 2016 trip report , as well as DocNow activity on github , slack , and Twitter .  In addition, the meeting itself was extensively live-tweeted with #BlackDigArchive (see also the data set of Tweet ids collected by Bergis Jules ). kicking off @documentnow #BlackDigArchive at @fergusonlibrary ! Livestream: https://t.co/K2XbN2HJdi pic.twitter.com/mVEBXSALTm — bibliotekah (@tttkay) December 11, 2017 Awesome keynote by @amplify285 ! So glad she could be here. Just found out she’s also a @WUSTL alumna. #B

2017-12-14: Storify Will Be Gone Soon, So How Do We Preserve The Stories?

Image
Popular Storytelling service, Storify , will be shut down on May 16, 2018 . Storify has been used by journalists and researchers to create stories about events and topics of interest. It has a wonderful interface, shown below, that allows one to insert text, but also add social cards and other content from a variety of services, including Twitter, Instagram, Facebook, YouTube, Getty Images, and of course regular HTTP URIs. This screenshot displays the Storify editing Interface. As shown below, Storify is used by news sources to build and publish stories about unfolding events, as seen below for the Boston NPR Station WBUR . Storify is used by WBUR in Boston to convey news stories. It is also the visualization platform used for summarizing Archive-It collections in the Dark and Stormy Archives (DSA) Framework , developed by WS-DL members Yasmin AlNoamany, Michele Weigle, and Michael Nelson. In a previous blog post , I covered why this visualization technique works and why m

2017-11-20: Dodging the Memory Hole 2017 Trip Report

Image
At the Internet Archive, it was rainy in San Francisco, but that did not deter those of us attending Dodging the Memory Hole 2017 . We engaged in discussions about a very important topic: the preservation of online news content. An attendee listens to a presentation at the @RJI Dodging the Memory Hole conference in San Francisco. #DTMH2017 pic.twitter.com/2ajM1k06Ru — RJI Futures Lab (@RJIFuturesLab) November 16, 2017 Keynote: Brewster Kahle, founder and digital librarian for the Internet Archive "Let's become a library. Let's be useful to society to understand ourselves." ― @brewster_kahle #DTMH2017 pic.twitter.com/rg7hMqWBdI — JDNA (@RJIJDNA) November 15, 2017 Brewster Kahle is well known in digital preservation and especially web archiving circles. He founded the Internet Archive in May 1996 . The WS-DL and LANL's Prototyping Team collaborate heavily with those from the Internet Archive, so hearing his talk was quite inspirational.