Wednesday, January 21, 2015

2015-02-05: What Did It Look Like?

Having often wondered why many popular videos on the web are time lapse videos (that is videos which capture the change of a subject over time), I came to the conclusion that impermanence gives value to the process of preserving ourselves or other subjects in photography. As though a means to defy the compulsory fundamental law of change. Just like our lives, one of the greatest products of human endeavor, the World Wide Web, was once small, but has continued to grow. So it is only fitting for us to capture the transitions.
What Did It Look Like? is a Tumblr blog which uses the Memento framework to poll various public web archives, take the earliest archived version from each calendar year, and then create an animated image that shows the progression of the site through the years.

To seed the service we randomly chose some web sites and processed them (see also the archives). In addition, everyone is free to nominate web sites to What Did It Look Like? by tweeting: "#whatdiditlooklike URL". 

In order to see how this system is achieved, consider the architecture diagram below. 

The system is implemented in Python and utilizes Tweepy and PyTumblr to access the Twitter and Python APIs respectively, and consists of the following programs:
  1. This application fetches tweets (with "#whatdiditlooklike URL" signature) by using the tweet ID of the last tweet visited as reference to know where to begin retrieving tweets. For example, if the application initially visited tweet IDs 0, 1, 2. It keeps track of the ID 2, so as to begin retrieving tweets with IDs greater than 2 in a subsequent tweet retrieval operation. Also, since Twitter rate limits the number of search operations (180 requests per 15 minute window), the application sleeps in between search operations. The snippet below outlines the basic operations of fetching tweets after the last tweet visited:
  2. This is a simple application with invokes for each nomination tweet (that a tweet with the "#whatdiditlooklike URL" signature).
  3. Given an input URL, this application utilizes PhantomJS, (a headless browser) to take screenshots and utilizes ImageMagick to create an animated GIF. It should also be noted that the GIFs created are optimized due to the snippet below in order to reduce their respective sizes to under 1MB. This ensures the animation is not deactivated by Tumblr.
  4. this application executes two primary operations:
    1. Publication of the animated GIFs of nominated URLs to Tumblr: This is done through the PyTumblr API create_photo() method as outlined by the snippet below:
    2. Notifying the referrer and making status updates on Twitter: This is achieved through Tweepy's api.update_status() method as outlined by the snippet below which tweets the status update message:However, a simple periodic Twitter status update message could result in the message to be flagged eventually as spam by Twitter. This comes in form of a 226 error code. In order to avoid this, does not post the same status update tweet message or notification tweet. Instead the application randomly selects from a suite of messages and injects a variety of attributes which ensure status update tweets are different. The randomness in execution is due to a custom cron application which randomly executes the entire stack beginning from down to

How to nominate sites onto What Did It Look Like?

If you are interested in seeing what a web site looked like through the years:
  1. Search to see if the web site already exists by using the search service in the archives page; this can be done by typing the URL of the web site and hitting submit. If the URL does not exist on the site, go ahead to step 2.
  2. Tweet "#whatdiditlooklike URL" to nominate a web site or tweet "#whatdiditlooklike URL1, URL2, ..., URLn" to nominate multiple URLs.
Tweet "#whatdiditlooklike URL" to nominate a web site or tweet "#whatdiditlooklike URL1, URL2, ..., URLn" to nominate multiple URLs.

How to explore historical posts

To explore historical posts, visit the archives page:

What Did Look Like?

What Did Look Like?
What Did Look Like?

"What Did It Look Like?" is inspired by two sources: 1) the "One Terabyte of Kilobyte Age Photo Op" Tumblr that Dragan Espenschied presented at DP 2014 (which basically demonstrates digital preservation as performance art; see also the commentary blog by Olia Lialina & Dragan), and 2) the Digital Public Library of America (DPLA) "#dplafinds" hashtag that surfaces interesting holdings that one would otherwise likely not discover.  Both sources have the idea of "randomly" highlighting resources that you would otherwise not find given the intimidatingly large collection in which they reside.

We hope you'll enjoy this service as a fun way to see how web sites -- and web site design! -- have changed through the years.


Thursday, January 15, 2015

2015-01-15: The Winter 2015 Federal Cloud Computing Summit

On January 14th-15th, I attended the Federal Cloud Computing Summit in Washington, D.C., a recurring event in which I have participated in the past. In my continuing role as the MITRE-ATARC Collaboration Session lead, I assisted the host organization, the Advanced Technology And Research Center (ATARC) in organizing and run the MITRE-ATARC Collaboration Sessions. The summit is designed to allow Government representatives to meeting and collaborate with industry, academic, and other Government cloud computing practitioners on the current challenges in cloud computing.

The collaboration sessions continue to be highly valued within the government and industry. The Winter 2015 Summit had over 400 government or academic registrants and more than 100 industry registrants. The whitepaper summarizing the Summer 2014 collaboration sessions is now available.

A discussion of FedRAMP and the future of the policies was held in a Government-only session at 11:00 before the collaboration sessions began.
At its conclusion, the collaboration sessions began, with four sessions focusing on the following topics.
  • Challenge Area 1: When to choose Public, Private, Government, or Hybrid clouds?
  • Challenge Area 2: The umbrella of acquisition: Contracting pain points and best practices
  • Challenge Area 3: Tiered architecture: Mitigating concerns of geography, access management, and other cloud security constraints
  • Challenge Area 4: The role of cloud computing in emerging technologies
Because participants are protected by Chathan House Rule, I cannot elaborate on the Government representation or discussions in the collaboration sessions. MITRE will continue its practice of releasing a summary document after the Summit (for reference, see the Summer 2014 and Winter 2013 summit whitepapers).

On January 15th, I attended the Summit which is a conference-style series of panels and speakers with an industry trade-show held before the event and during lunch. At 3:25-4:10, I moderated a panel of Government representatives from each of the collaboration sessions in a question-and-answer session about the outcomes from the previous day's collaboration sessions.

To follow along on Twitter, you can refer to the Federal Cloud Computing Summit Handle (@cloudfeds), the ATARC Handle (@atarclabs), and the #cloudfeds hashtag.

This was the fourth Federal Summit event in which I have participated, including the Winter 2013 and Summer 2014 Cloud Summits and the 2013 Big Data Summit. They are great events that the Government participants have consistently identified as high-value. The events also garner a decent amount of press in the federal news outlets and at MITRE. Please refer to the list of press for the most recent articles about the summit.

We are continuing to expand and improve the summits, particularly with respect to the impact on academia. Stay tuned for news from future summits!

--Justin F. Brunelle

Saturday, January 3, 2015

2015-01-03: Review of WS-DL's 2014

The Web Science and Digital Libraries Research Group's 2014 was even better than our 2013.  First, we graduated two PhD students and had many other students advance their status:
In April we introduced our now famous "PhD Crush" board that allows us to track students' progress through the various hoops they must jump through.  Although it started as sort of a joke, it's quite handy and popular -- I now wish we had instituted it long ago. 

We had 15 publications in 2014, including:
JCDL was especially successful, with Justin's paper "Not all mementos are created equal: Measuring the impact of missing resources" winning "best student paper" (Daniel Hasan from UFMG also won a separate "best student paper" award), and Chuck's paper "When should I make preservation copies of myself?" winning the Vannevar Bush Best Paper award.  It is truly a great honor to have won both best paper awards at JCDL this year (pictures: Justin accepting his award, and me accepting on behalf of Chuck).  In the last two years at JCDL & TPDL, that's three best paper awards and one nomination.  The bar is being raised for future students.

In addition to the conference paper presentations, we traveled to and presented at a number of conferences that do not have formal proceedings:
We were also fortunate enough to visit and host visitors in 2014:
We also released (or updated) a number of software packages for public use, including:
Our coverage in the popular press continued, with highlights including:
  • I appeared on the video podcast "This Week in Law" #279 to discuss web archiving.
  • I was interviewed for the German radio program "DRadio Wissen". 
We were more successful on the funding front this year, winning the following grants:
All of this adds up to a very busy and successful 2014.  Looking ahead to 2015, as well as continued publication and funding success, we are expecting to graduate both one MS & one Ph.D. student and host another visiting researcher (Michael Herzog, Magdeburg-Stendal University). 

Thanks to everyone that made 2014 such a great success, and here's to a great start to 2015!