Tuesday, October 22, 2013

2013-10-23: Preserve Me! (... if you can, using Unsupervised Small-World graphs.)

Everyday we create more and more digital files that record our lives.  We take selfies (with and without our loved ones).  We record our baby's first step.  We take pictures of things that we have or would like to have.  The number of digital file and artifacts we create grows and grows and the places where we can store them seem to have almost infinite capacity.  Smart phones with 64Gigabytes of storage, could hold almost 20,000 MP3 files (roughly 1,000 hours of listening time, or about 6 months of listening 8 hours a day).  Amateur cameras can have the same amount of storage, and depending on image size and frames per second can store days of continuous recordings or about 500,000 still images.  We can and are creating more digital artifacts than we can manage.  Being able to create so much, means we don't care about what we create.  We create because it is easy.  We create because it is fun.  We create because we have a new toy.  We create because we can.  There is a significant downside to this creation craze.  How can we preserve our selfies for our children??  How can we share our baby's first step with their babys??  How can we show what we had when we were young, now that our hair is silver??  How can we show unknown others in the future those things that were important to us in our youth??  How do we preserve our selves??

We could foist the preservation responsibility of all that we create onto our children (seems sort of unkind).  We could preserve our selves using a commercial  or governmental institution , but that may not be too much better.  Another way to attack the problem is to rephrase the question.  Instead of: how we preserve digital artifacts and objects??  Change the question to: how can digital objects preserve themselves??  If we can imbue digital objects with directions to preserve themselves and provide a benign environment where they can survive then they should be able to continue to be available long after we are gone.  Long after our children are gone.  Long after those that loved and cared for us are forgotten.  Imbuing digital objects with preservation directives and providing a benign environment are at the heart of Unsupervised Small-World (USW) graphs.

At Old Dominion University, we created a demonstration USW environment composed of representative sample webpages, faux domains with supporting RESTful methods, and a robot to represent users as they wandered through the Internet and viewing the representative webpages.  We scraped parts of four domains (flickr.com, arXiv.org, RadioLab.org and Gutenberg.com) to collect representative pages with different types of digital files. 

As the human facing portion of the USW graph,  we mocked up a Preserve Me! button so that the webpage viewer could add the webpage to the USW graph.
Mock up of an ArXiv page with the Preserve Me! button.
Putting the Preserve Me! capability into the hands of everyone is in keeping with the idea that everyone should be an curator (Frank McCown "Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations").  After the Preserve Me! button is pressed, the second screen appears,
Mock up of Preserve Me! REM messages.
and an Object Exchange and Reuse (ORE) REsource Map (REM) serialization of the original webpage is created.  The REM representation of the original webpage will be preserved by the USW process.

There are two major parts of the benign infrastructure.   Firstly is a set of servers that support two USW RESTful methods called "copy" and "edit."  The "copy" method creates a copy of a foreign REM in the local domain and "edit" updates selected REMs on the local domain.  Secondly is an HTTP message server (Sawood Alman's Master's thesis) which provides a communication mechanism for exchanging actionable HTTP directives between the USW imbued digital objects.

As an example, we going to talk about preserving a scanned image from the 1900s.
Josie McClure, 1907, 15 years old.
The image was uploaded to flickr.com and was scraped to become part of the ODU benign USW demonstration environment.

A robot was written to act as a human visiting the different pages in the ODU USW demonstration environment.  The robot was written rather than have a human repeatedly press the Preserve Me! button on different pages.  It is possible to watch the USW graph grow using the Preserve Me! Visualizer
Preserve Me! Visualizer
and to this specific example.
Preserve Me! Visualizer with data

Things to look for and at in the Visualizer:

1.  The "copy," "edit," and HTTP mailbox infrastructure components are represented by the three cyan colored icons in the center of the display.

2.  Original USW REMs are in a concentric circle close to the infrastructure icons and are color coded.  REMs from flickr have a magenta frame, those from RadioLab have a blue frame, and REMs from Gutenberg have a yellow frame.

3.  Copy USW REMs are much further out from the center and have the same color as the domain they are hosted on, but the contents of the REMs are from their original domain.

4.  Permanent connections between REMs (edges in the USW graph) are directional and colored white.

5.  Activity between a REM and any of the infrastructure components are directional, red and transient.

6.  If a REM is removed from the system, a red slash is drawn through it's icon.

7.  Below the plotting area are VCR like controls, including speed controls, toggling the background between black and white, capturing an image and maximizing the display.

8.  Placing the pointer over any of the icons will cause almost all other icons and edges to become hidden.  The only things that will be visible is the icon under the pointer, permanent edges originating at that icon, and icons that are pointed to by the permanent edge.

9.  Clicking on an icon will show explanatory information about the icon.

10.  A REM will try to make preservation copies on domains, other than its own that it knows about.

Preserve Me! Viz replays a prerecorded JSON log of events.  These events came from a scenario that the robot executed.  Between the time the robot created the JSON log file and when you replay the visualization of the robot's actions, the USW graph created by the robot may no longer be in existence (caveat emptor).

The general events are:

1.  REM #1 retrieves messages from its mailbox. (As indicated by the flashing red line from the REM to the HTTP mailbox icon.)

2.  Based on the messages, REM #1 might execute HTTP patch directives (as indicated by the flashing red line from the REM to the edit icon), might create preservation copies of another REM (as indicated by the flashing red icon from the REM to the copy icon and the creation of preservation REM), or other actions.

3.  REM #1 might inspect REM #2 to retrieve data from REM #2.

4.  Based on that information, REM #1 might send HTTP patch directives or copy requests to REM #2.

A REM will never directly affect another REM.  A REM will send requests and directives via the HTTP mailbox.

The replay file shows 17 webpages, across 4 domains creating preservation copies of themselves on domains different than the one where they were created.  Josie originated on the flickr domain (at the 6 o'clock position and framed in magenta), preserves a copy on the Gutenberg domain (at the 1:30 position and framed in yellow), and made USW connections to a REM originating on the Gutenberg domain (at the 12:00 o'clock framed in yellow) and preservation copies on the flickr and RadioLab domains (framed in magenta and green respectively).

Things to watch for include during a replay of the example, or you can watch a video:

Event number (real time in seconds):
2 (1.825) Josie exists in the USW realm.

5 (6.175) USW infrastructure is complete and available.

6 - 9 (8.575 - 14.290) The first USW REM connection is made from flickr's Kittens to Gutenberg's Pride and Prejudice.

10 - 277 (663.476) Additional REMs are added to the system and make connections to Gutenberg's Pride and Prejudice.

278 - 326 (664.884 - 770.372) Gutenberg's Pride and Prejudice begins to read messages from its mailbox.

327 - 525 (771.436 - 1134.216) Gutenberg's Pride and Prejudice creates reciprocal REM connections to other REMs, creates preservation copies on the Gutenberg domain and sends messages back to requesting REMs.

526 (1135.699) A preservation copy of Josie is created on the Gutenberg domain.

527 - 1056 (2921.045) REMs continue to make preservation copies and permanent edges as directed by messages from the HTTP mailbox.

1057 (2922.324) The first REM on the RadioLab domain is lost.  The next few events will show all REMs on the RadioLab domain as lost.  These few events simulate the total loss of the domain either through closing the domain, terminating the domain's participation in the USW process, or disconnection of the domain from the Internet.

1093 - 1735 (2999.098 - 5398.655) The remaining REMs continue to process messages from their respective mailboxes until all messages have been processed and no more communications are needed or necessary.  In effect, the USW system has reached a point of stability and does not have any growth or change opportunities.

Now Josie (my grandmother's sister) exists on two domains and given a larger benign environment, could spread to more places thereby increasing the likelihood of being around long after those that knew her have been forgotten.
On the picture's back: Josie McClure's Picture taken Feb. 30, 1907 at Poteau I. T. Fifteen years of age.  When this was taken weighed 140 lbs.


Chuck Cartledge

No comments:

Post a Comment