Wednesday, February 29, 2012

2012-02-24: Personal Digital Archiving 2012

For its third consecutive year, the Personal Digital Archiving conference took place at Internet Archive in San Francisco, CA. Ahmed and I attended a diverse range of fascinating sessions on how people think about creating and preserving personal digital archives. The environment was very nice, and friendly (there were a baby and a dog in the second day ^_^).
The conference was held on Feb. 24 and Feb. 25, 2012. The first day started at 9:00 am with a keynote and welcome intro by Brewster Kahle about the Internet Archive and personal archives. Brewster gave a quick intro about Internet Archive history and asked an important question, “what would we want out of the Internet Archive in terms of preserving stuff that individuals are creating?”, which should be solved by knowing how to collect materials and make them useful for the people from this conference.
Mike Ashenfelder from the Library of Congress gave a talk entitled “Personal Digital Archive Advice for the General Public” (video). He gave a quick description for the LoC main role in archiving, additional to LoC effort for the personal digital archiving. At the end, he gave an advice for the general public; identify what you want to save, decide what is most important to you, organize the content, and save copies in different places. The take home message can be summarized in these words: outreach, educate, simplify.
Stan James gave a talk entitled “How my Family Archives Affected Others” (video). James gave an update from the last year, he and his father started a project together to collect and archive all their family photos, documents, letters, postcards and more. More than 20,000 files are over the past year. He realized that his family is ahead of the curve. He started with the beginning of the story; a photo of his grandmother who burned all their letters after she had let her children read them. He encouraged his family to collaborate in archiving their stuff. He mentioned that his dad is still scanning the photos and this project brings the family together. James tried many tools for uploading photos; Picasa does not let you enter dates before 1970 (Unix timestamp), Google Plus can’t edit the date at all. At last, he used Facebook timeline to upload his dad’s 25,000 photos and organize them.
Jerry Michalski gave a talk entitled “What I’ve learned from gardening my Brain” (video). He presented Brain, mind-mapping software that can be used in a place of bookmarking. He was using it for 15 years and shared his brain as a case study to explain the benefit of the tool. He mentioned that Brain helps in organizing the stream-of-consciousness thinking.
Jo An Morfin-Guerrero from University of Bristol presented her work “Unstable Archives: Performing the Franko B Archive” (video). She started her session with a case study of an artist (Franko B) who was trying to archive his pictures, media and all his work long time ago. It was part of her Ph.D. research on preservation of media and different artistic practices to organize and archive Franco’s records.
Through Media Types session, I learned of useful tools for preserving emails, bookmarking, and preserving photos. Peter Chan and Sudheendra Hangal from Stanford University gave two talks entitled “Processing and Delivering Email Archives in Special Collections using Muse” (video) and “Putting Personal Archives to Work: Reminiscence, Search and Browsing” (video), respectively about Muse, a project for archiving emails. It does sentiment analysis and slideshow for the images in the attachments. Muse gives a good insight for the trends in emails over time such as the topics of the month and so on. They mentioned that there are many challenges for email archiving, such as copyright and privacy, sensitive information, description (for creating metadata), and delivering. I found publication list which has more information about Muse.
Aaron Straup Cope presented his personal work “Parallel-flickr”. He started his talk with a question “What would happen if flickr went away tomorrow?”. He developed Parallel-flickr which is a software code that uses the Flickr API to pull out the photos and photo information.
Maciej Ceglowski from Pinboard a talk entitled “Remember the Web? Practical challenges of Bookmarking for Keeps” (video). Ceglowski presented Pinboard, a paid bookmarking site which was founded in 2009, 9 million archived bookmarks, and 4 TB stored web content. Pinboard downloads the full content of the bookmark and store it on a Pinboard server. Ceglowski said that “The search engine does not replace the need for your own bookmarks.”
Personally, I liked the idea of keeping the content of each bookmark; I used to save the content of each file I found that it was useful on my local machine.
Next was lunch and after that were Social Network Data session. Marc A. Smith from the Social Media Research Foundation gave a talk entitled “Arc-chiving: saving social links for study” about NodeXL, an open tool for visualizing the connections of social media data and converting them into graphs (video). NodeXL works with Excel 2007 and 2010.
I tried it myself in Information Visualization class and I created very cool graphs and gained a good insight about my relations of Twitter and Facebook. The graph on the right shows how most of my friends on Facebook don't assign relationship status. This is a Group-in-a-Box (GIB) layout of the Facebook network using the “Circular layout”. Clustering was done based on the relationship status of friends. The gray cluster represents the friends with no assigned relationship status.
Megan Alicia Winget from University of Texas at Austin gave a theoretical talk about thing-based behavior of looking at archiving objects versus interaction-based behavior (yelp, Facebook, etc.) which entitled “Personal Interaction Archiving: Saving our Attitudes, Beliefs, and Interests” (video). She raised a question: what are the people saving when they upload their annotations? She is studying the annotations/highlights that the people do through ebook tools and share them. Winget said that “The bookmarking tools can be thought of as a new form of commonplace books”.
Jonathan Harris from Cowbird gave a talk entitled “Cowbird : A public library of human experience” (video). He presented Cowbird, a site for the people to upload stories. He mentioned that his long-term goal is to build a public library of human experience.
Before the last Keynote, there were many interesting Lightning Talks. Denim Smith presented milifemap in which the people can upload photos and personal videos, create private diaries, and share their thoughts (video). It has a timeline visualization for organizing and presenting people’s personal content.
Christopher Prom from UIUC gave a lightning talk entitled “iKive: Towards a Trusted Personal Archives Service” (video). He presented iKive, a research project for the people to easily save their personal digital files to a trusted location.
Carly Strasser from California Digital Library gave a talk entitled “Digital Curation for Excel (DCXL)” (video). She presented DCXL project to facilitate data publishing, sharing, and organizing that would benefit others. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing. Initial ideas include generating metadata, incorporating links to scientific data repositories and their requirements, and using controlled domain-specific vocabularies.
At the end of the day, Brewster Kahle gave an ending keynote entitled “A Data Archiving Service” (video). He gave different examples about the archiving initiatives, he estimated the cost for archiving a book page by 25 cent and he estimated the cost for storing 1TB by $2000 which is 40x more than the raw cost of TB hard disk.
Day 2 started with a keynote by Cathy Marshall from Microsoft Research entitled “Ownership, aggregation and re-use of Personal Data” (video). She started with an interesting question “Whose Content is it anyway?" She begins her talk with an example of how to reuse the pictures on the public web and social media. She did a study of user behaviors around using and reusing images. She found that everyone believes that you can keep anything you find online. “It’s yours.” About preserving social media, she said that “people can’t make a go of it on their own. Therefore, we need institutional archives to help with preserving social media”.
User Studies session started with Sarah Kim from University of Texas at Austin who gave a talk entitled “What is your plan for your personal digital archives after your lifetime? Learning from individuals” (video). A part of her Ph.D. dissertation, Kim presented the result of case studies of 20 persons. She asked them “what is your plan for your personal digital archives after your lifetime? “Most of them had never thought about it before or even didn't even think about planning for their personal digital archives. She found that there are reasons to leave something behind. She got a few basic categories into which people may fall: Delete all, Create condensed version, Sort and distribute to designated entities (e.g. kids, colleagues), Write in a will what to do including disposal and access methods, Allow caretaker or others to manage, Expect materials will be lost or deleted.
Debbie Weissmann gave a talk entitled “Personal Archiving in Not Personal Spaces” (video). She gave some examples of personal tweets and posts for opinions/problems at work and the persons who did that were fired. She raised an interesting question “What are the laws concerning that kind of thing?” There are no specific rules about this issue till now.
Lori Kendall from UIUC gave a talk entitled “Use of Personal Archives: Family History Works” (video). She argues that genealogy is becoming more of a thing due to’s popularity and that there are a lot of TV shows and news shows dedicated to it. She argues that genealogy is cool because it puts you in the middle as opposed to being “just another node” and the individualistic society of Americans can foster some love for the idea of a self-contained ecosystem based on ancestry.
Academics session started with interesting talks and then it was a Panel. John Butler from University of Minnesota gave a talk entitled “Practices in digital scholarship and personal archiving” (video). He made a study of research behaviors (50 grad students and faculty survey). He spoke about managing data to insure that they fit to contemporary and can be reused. He presented data management plan and best practices to preserve the data. The Key Finding is: there is a strong diversity of resources or media used. Methods learned in traditional contexts are not easily transferred to digital context. Researchers have unique collections to be shared, but they want to do it under personally-specified conditions.
Ellysa Stern Cahoy from Penn State University Libraries gave an interesting presentation entitled "Faculty Member as Microlibrarian: Critical Literacies for Personal Scholarly Archiving" (video). Her talk focused more on students (or perhaps scholars of tomorrow) than faculty to locate, and use the information. She talked about the ACRL Information Literacy Competency Standards for Higher Education, 2000 including the inclusion of effective competencies and a greater connection with K-12 learning standards.
The panel entitled “What's being Lost, What's being Saved: Practices in digital scholarship and personal archiving” (video). It had been shared between Smiljana Antonijevic from Royal Netherlands Academy of Arts and Sciences, John Butler from University of Minnesota, Laura Gurak from University of Minnesota, and Ellysa Stern Cahoy from Penn State University Libraries. Each of them covered a specific area of personal archiving relevant to academia. One of the interesting questions was about archiving emails, because every time we switch to another system, it is hard to preserve emails. Brewster said “IA has moved away from email to Skype. It is dead here, except for official purposes”.
After the panel, we took the lunch and Post Lunch session started. Jason Scott from Internet Archive gave a very nice talk about his work in IA entitled “Archive Team and the Case of the Widespread Recognition” (video). He mentioned some of his team achievement during the last year. He said that “Google is a library or archive like a supermarket is a food museum.”
Commercial Services session started with Maciej Ceglowski gave his second talk which entitled“The Business of Web Archiving” about Pinboard, but this time he presented the business models around social bookmarking: charge money (Zootool, Instapaper, Diigo, Pinboard,..etc), burn money “finding a sponsor”(Delicious, Yahoo Bookmarks, Google Bookmarks), offer a free service and fail (Magnolia 3 years, MyWeb 6 years, Xmarks). He gave a summary about Pinboard 2011 financials and the risks Pinboard faces.

The graph on the right is the HTTP requests per minute for two different periods; Dec. 9-11 in blue and Dec. 16-18 in green. The blue bars show web traffic one week before the news that Yahoo will sunset Delicious; the green bars show how traffic spiked immediately after the plans to ‘sunset’ Delicious became public. The image shows how it is important for the people to save their bookmarks; at the first sign of danger, people stampeded away to save their bookmarks.
Jed Lau from Memoir Tree company gave a talk entitled “Digital Archive for the Elderly: Facilitating Old-Fashioned Storytelling” (video). He started his talk with a story about his grandmother who told him many stories and he missed her after she died. It was an introduction for the company. Memoir Tree is an app for iPhone that makes it easy to tell, show, and share history through recorded audio and photos. He made a live experiment for the attendance by asking them “what is your favorite ice cream flavor?”. He recorded and showed that instantly.
Stacy Colleen Kozakavich gave a talk entitled “Every House has a History”. She said “you can research the history of your house yourself”.
Economics session started around 3:30 with a great talk to David S. H. Rosenthal from Stanford University entitled “Modeling the economics of long-term storage” (video). One of the interesting questions he tried to answer was “Why would we believe that in the future storage costs will drop at varying rates?”.
Lightning Talks of the second day presented many interesting tools. Matt Zimmerman gave a talk entitled “An Open Source personal data platform” (video). Zimmerman presented Singly which allow the users to put all of their data (personal photos, places, links, contacts) into a single structured place. He also talked about locker Project, an open source personal data store which is still under active development, where you can collect and store all of your personal digital information.
Eric C. Cook from University of Michigan focuses in his lightning talk which entitled “Personal Digital Photography and the Implications of Selective Positive Representation” on digital photography (video).
Jerome McDonough from UIUC gave a lightning talk entitled “Deep Personal Significance: Computer Gaming & the Notion of Significant Properties” (video). He introduced an ongoing research project called Preserving Virtual Worlds 2 funded by the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIP). They are focusing on digital preservation of complex media. It focuses on finding significant properties for a variety of educational games and game series in order to provide a set of best practices for preserving the materials through virtualization technologies and migration, as well as provide an analysis of how the preservation process is documented.
In summary, the presentations were thought-provoking, and creative. I was happy to meet that mix of archivists, programmers, and researchers each with different approach to personal archiving. Many presentations were about tools for preserving personal media. Few presentations were about case studies in which the researchers worked directly with people and reported their findings. I heard about many interesting tools, such as Muse, Pinboard, etc. The conference next year will take place at the University of Maryland.
Quotes from the speakers
  • “Google is a library or archive like a supermarket is a food museum” - Jason Scott
  • “Everyone takes pictures of the wedding, never of the divorce.”
  • “You can research the history of your house yourself.” - Stacy Colleen Kozakavitch
  • “Software engineers are now social engineers.” - Jonathan Harris
  • “Everything on the web feels so disposable.” - Jonathan Harris
  • “Sometimes we make things more complicated and we need to simplify.”
  • “The process of figuring out where to put thoughts—has to be something you enjoy.” - Jerry Michalski
  • “The search engine does not replace the need for your own bookmarks.” - Maciej Ceglowski
For more about the conference from another prospective, please check out the Litbrarian blog, Chris Prom's blog, Ellysa Cahoy's blog, The Wiki Librarian's day1 and day2, the Personal Digital Archives' blog, Mike Ashenfelder's blog, the DCXL's blog post, and #PDA12 on twitter.
I'll add the videos for the talks later.

(2011-04-8 Update:) I have associated links to video recording of each session.

1 comment:

  1. The videos for the PDA 2012 sessions are currently available here