Monday, October 27, 2014

2014-10-27: 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent

Herbert and I attended the "404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent" at the Georgetown Law Library on Friday, October 24, 2014.  Although the origins for this workshop are many, catalysts for it probably include the recent Liebler  & Liebert study about link rot in Supreme Court opinions,  and the paper by Zittrain, Albert, and Lessig about and the problem of link rot in the scholarly and legal record and the resulting popular media coverage resulting from it  (e.g., NPR and the NYT). 

The speakers were naturally drawn from the legal community at large, but some notable exceptions included David Walls from the GPO, Jefferson Bailey from the Internet Archive, and Herbert Van de Sompel from LANL. The event was streamed and recorded, and videos + slides will be available from the Georgetown site soon so I will only hit the highlights below. 

After a welcome from Michelle Wu, the director of the Georgetown Law Library, the workshop started with an excellent keynote from the always entertaining Jonathan Zittrain, called "Cites and Sites: A Call To Arms".  The theme of the talk centered around "Core Purpose of .edu", which he broke down into:
  1. Cultivation of Scholarly Skills
  2. Access to the world's information
  3. Freely disseminating what we know
  4. Contributing actively and fiercely to the development of free information platforms

For each bullet he gave numerous anecdotes and examples; some innovative, and some humorous and/or sad.  For the last point he mentioned Memento,, and timed release crypto

Next up was a panel with David Walls (GPO), Karen Eltis (University of Ottawa), and Ed Walters (Fastcase).  David mentioned the Federal Depository Library Program Web Archive, Karen talked about the web giving us "Permanence where we don't want it and transience where we require longevity" (I tweeted about our TPDL 2011 paper that showed for music videos on Youtube, individual URIs die all the time but the content just shows up elsewhere), and Ed generated a buzz in the audience when he announced that in rendering their pages they ignore the links because of the problem of link rot.  (Panel notes from Aaron Kirschenfeld.)

The next panel had Raizel Liebler (Yale) author of another legal link rot study mentioned above and an author of one of the useful handouts about links in the 2013-2014 Supreme Court documentsRod Wittenberg (Reed Tech) talked about the findings of the Chesapeake Digital Preservation Group and gave a data dump about link rot in Lexis-Nexis and the resulting commercial impact (wait for the slides).  (Panel notes from Aaron Kirschenfeld.)

After lunch, Roger Skalbeck (Georgetown) gave a web master's take on the problem, talking about best practices, URL rewriting, and other topics -- as well as coining the wonderful phrase "link rot deniers".  During this talk I also tweeted TimBL's classic 1998 resource "Cool URIs Don't Change". 

Next was Jefferson Bailey (IA) and Herbert.  Jefferson talked about web archiving, the IA, and won approval from the audience for his references to Lionel Hutz and HTTP status dogs.  Herbert's talk was entitled "Creating Pockets of Persistence", and covered a variety of topics, obviously including Memento and Hiberlink.

The point is to examine web archiving activities with an eye to the goal of making access to the past web:
  1. Persistent
  2. Precise
  3. Seamless
Even though this was a gathering of legal scholars, the point was to focus on technologies and approaches that are useful across all interested communities.  He also gave examples from our "Thoughts on Referencing, Linking, Reference Rot" (aka "missing link) document, which was also included in the list of handouts.  The point on this effort is enhance existing links (with archived versions, mirror versions, etc.), but not at the expense of removing the link to the original URI and the datetime of intended link.  See our previous blog post on this paper and a similar one for Wikipedia.

The closing session was Leah Prescott (Georgetown; subbing for Carolyn Cox),  Kim Dulin (Harvard), and E. Dana Neacşu (Colombia).   Leah talked some more about the Chesapeake Digital Preservation Group and how their model of placing materials in a repository doesn't completely map to the model of web archiving (note: this actually has fascinating implications for Memento that are beyond the scope of this post).  Kim gave an overview of Harvard's archive, and Dana gave an overview of a prior archiving project at Columbia.  Note that recently received a Mellon Foundation grant (via Columbia) to add Memento capability.

Thanks to Leah Prescott and everyone else that organized this event.  It was an engaging, relevant, and timely workshop.  Herbert and I met several possible collaborators that we will be following up with. 


-- Michael

Thursday, October 16, 2014

2014-10-16: Grace Hopper Celebration of Women in Computing (GHC) 2014

Photo credit to my friend Mona El Mahdy
I was thrilled and humbled for the second time to attend Grace Hopper Celebration of women in computing (GHC) 2014, the world’s largest gathering for technologists women. GHC is presented by the Anita Borg Institute for Women and Technology, which was founded by Dr. Anita Borg and Dr. Telle Whitney in 1994 to bring together research and career interests of women in computing and encourage the participation of women in computing. The twentieth anniversary of GHC was held in Phoenix, Arizona on October 8-10, 2014. This year, GHC has almost doubled the number of women who have research and business interests from the last year to be 8,000 women from about 67 countries and about 900 organizations to get inspired, gain expertise, get connected, and have fun.

Aida Ghazizadeh from the Department of Computer Science at Old Dominion University also was awarded travel scholarships to attend this year's GHC. I hope ODU will have more participation in the upcoming years.

The conference theme this year was "Everywhere. Everyone.”. Computer technologies are everywhere and everyone should be included for driving innovations. There were multiple technical tracks featuring the latest technologies in many fields such as cloud computing, data science, security, and Swift Playgrounds Programming language by Apple. Conference presenters represented many different fields, such as academia, industry, and government. The non-profit organization "Computing Research Association Committee on Women in Computing (CRA-W)", also offered sessions targeted towards academics and business. I had a chance to attend Graduate Cohort Workshop in 2013, which was held in Boston, MA, and created a blog post about it.

The first day started off with welcoming the 8,000 conference attendees by Dr. Telle Whitney, the president and the CEO of Anita Borg Institute. She mentioned how the GHC started the first time on 1994 in Washington DC to bring together research and career interests of women in computing and encourage the participation of women in computing. "Join in, connect with one another, be inspired by our speakers, be inspired by our award winners, develop your own skill and knowledge at the leadership workshops and at the technical sessions, let's all inspire and increase the ratio,  and make technology for everyone  everywhere,” Whitney said. Then she introduced Alex Wolf, the President of the Association of Computing Machinery (ACM) and a professor in Computing at Imperial College London, UK, for opening remarks.

Ruthe Farmer
Barbara Biungi and Durbana Habib
After the opening keynote, the ABIE Awards for social impact and Change Agent were presented by the awards' sponsors. The recognitions went to Ruthe FarmerBarbara Birungi and Durdana Habib who gave nice and motivated talks. Some highlights from Farmer's talk was:
  • "The next time you witness a technical woman doing something great, please tell her, or better tell others about her."
  • "The core of aspiration in computing is a powerful formula recognition plus community.” 
  • "Technical Women are not outliers."
  • "Heads up to all of you employers out there. There is a legion of young women heading your way that will negotiating their salaries ... so budget accordingly!"

The keynote of the first day was for Shafi Goldwasser, RSA Professor of Electrical Engineering and Computer Science at MIT and 2012 recipient of the Turing Award, about the history and benefits of cryptography and also her work in cryptography. She discussed the challenges in encryption and cloud computing. Here are some highlights from Goldwasser's talk:
  • "With the magic of cryptography, we can get the benefits of technology without the risks."
  • "Cryptography is not just about finding the bad guys, it is really about correctness, and privacy of computation"
  • "I believe that a lot of the challenges for the future of computer science are to think about new representations of data. And these new representations of data will enable us to solve the challenges of the future."

Picture taken from My Ramblings blog
After the opening keynote, we attended the Scholarship Recipients Lunch which was sponsored this year by Apple. We had engineers from Apple on each table to communicate with us during the lunch.

The sessions started after the lunch break. I attended CRA-W track: Finding Your Dream Job Presentations, which had presentations by Jaeyeon Jung from Microsoft Research and Lana Yarosh from University of Minnesota. The session targeted the late stage graduate students for helping them in deciding how to apply for jobs, how to prepare for interview, and also how to negotiate a job offer. The presenters allotted a big time slot for questions after they finished their presentations. For more information about "Finding Your Dream Job Presentations" session and the highlights of the session, here is an informative blog post:
GHC14 - Finding your Dream Job Presentations

A global community of women leaders panel
The next session I attended was "A Global Community of Women Leaders" panel in the career track, moderated by Jody Mahoney (Anita Borg Institute). The panelists were Sana Odeh (New York University), Judith Owigar (Akirachix), Sheila Campbell (United States Peace Corps), Andrea Villanes (North Carolina State University).  They explained their roles in increasing the number of women in computing and the best ways to identify global technology leaders through their experience. At the end, they opened questions to the audience. "In the middle east, the women in technology represents a big ratio of the people in computing," said Sana Odeh.

There were many interesting sessions, such as, "Building Your Professional Persona Presentations" and "Building Your Professional Network Presentations", for presenting how to build your professional image and how to promote yourself and present your ideas in a concise and appealing way to the people. These are two blog posts that cover the two sessions in details:
Facebook booth in the career fair #GHC14
In the meantime, the career fair was launched on the first day, Wednesday 8 October at 4:30 - 6:30 p.m and continued the second day and part of the third day. The career fair is a great forum for facilitating open conversations about career positions in industry and academia. Many famous companies, such as Google, FacebookMicrosoftIBM, Yahoo,  Thomson Reuters, etc.,  many universities such as, Stanford University, Carnegie Mellon UniversityThe George Washington UniversityVirginia Tech University, etc., and non-profit organizations such as CRA-W. Each company had many representatives to discuss the different opportunities they have for women. The poster session was held in the evening.

Cotton candy in the career fair #GHC14
Like the last year, Thomson Reuters attracted many women's attention with a great promotion through bringing up a caricature artists. Other companies used nice ideas to promote themselves, such as cotton candy. There were many representatives for promoting each organization and also for interviewing. I enjoyed being among all of these women in the career fair which inspired me enough to think about how to direct my future in a way to contribute to computing and also encourage many other women to computing. My advice to anyone who will go to GHC next year, print many copies of your resumes to be prepared for the career fair.

Day 2 started with welcoming from the audience by Barb Gee, the vice president of programs for Anita Borg institute. Gee presented the GirlRising videoclip "I'm not a number".

After the clip, Dr. Whitney introduced the special guest, the amazing Megan Smith, the new Chief Technology Officer of the United States and the previously vice president of Google[x]. Smith was a last year's keynote speaker, in which she gave a very inspiring talk entitled, "Passion, Adventure and Heroic Engineering". Smith welcomed the audience and talked about her new position as the CTO of the United States. She expressed her happiness to serve the president of USA and serve her country. "Let’s work together together to bring everyone a long and to bring technology that we know how to solve the problems with," Smith said at the end of her short inspiring talk.

Dr. Whitney talked about the the Building Recruiting And Inclusion for Diversity (BRAID) initiative between the Anita Borg Institute and Harvey Mudd College to increase the diversity in computer science undergraduates. The BRAID initiative is funded by Facebook, Google, Intel, and Microsoft.

The 2014 GHC technical leadership ABIE award went to Anne Condon, a professor and the head of the Department of Computer Science at University of British Columbia. Condon donated her award to Grace Hopper India and Programs of the Computing Research Association (CRA).

Maria Kawle on the right Satya Nadella on the left 
Satya Nadella, the Chief Executive Officer (CEO) of Microsoft, in an interesting conversation with Maria Kawle, the president of Harvey Mudd College, was the second keynote of GHC 2014. Nadella is the first male speaker at GHC. Nadella was asked many interesting questions. One of them as "Microsoft has competitors like Apple, Google, Facebook, Amazon. What can Microsoft do uniquely do in this new world?" Nadella answered that the two things that he believes Microsoft contribute to the world are the productivity and the platform. Maria continued, "it is not a competition, it is a partnership".

In answer to a tough question "Why does Microsoft hire fewer female engineer employers than male?", Nadella said that they all now have the numbers out there. Microsoft number is about 17% and it is almost the same numbers as Google, Facebook, and little below Apple. He said, "the real issue in our company how to make sure that we are getting women who are very capable into company and well represented".

In response to a question about how to ask for a raise in salary, Nadella said: "It’s not really about asking for a raise, but knowing and having faith that the system will give you the right raise." Nadella got a torrent of criticism and irate reaction on twitter.

Nadella later apologized for his "inarticulate” remarks in a tweet, followed by an issued statement to Microsoft employee, which was published on company's website.

"I answered that question completely wrong," said Nadella. "I believe men and women should get equal pay for equal work. And when it comes to career advice on getting a raise when you think it’s deserved, Maria’s advice was the right advice. If you think you deserve a raise, you should just ask."

Day 3 started with some announcements from the ABI board, then the best posters announcement and the Awards Presentation. The last keynote was by Dr. Arati Prabhakar, the Director of the Defense Advanced Research Projects Agency (DARPA). Dr. Prabhakar talked about "how do we shape our times with the technology that we work on and we passionate about?". Dr. Prabhakar shared neat technologies with us in her keynote. She started with a video of a quadriplegic using her thoughts to control a robotic arm by blogged her brain to the computer. She talked about building technologies at DARPA. She answered many questions from at the end related to her work in DARPA. It is an amazing to see a successful women who creates technology that serves her country. The keynote ended with a nice video promoting GHC 2015.

Latest trends and technical challenges of big data panel
After the keynote, I attended "Latest Trends and Technical Challenges of Big Data Analytics Panel", which was moderated by Amina Eladdadi (College of Saint Rose). The Panelists were Dr. Bouchra Bouqata from GE, Dr. Kaoutar El Maghraoui from IBM, Dr. Francine Berman from RPI, and Dr. Deborah Agarwal from LBNL. This panel focused on discussing new Big Data Analytics data-driven technologies, infrastructure, and challenges. The panelists introduced use cases from industry and academia. They are many challenges that faces big data: storage, security (specifically for cloud computing), and the scale of the data and bring everything together to solve the problem.

ArabWIC lunch table
After the panel, I attended the career fair then I attended the Arab Women in Computing (ArabWIC) meeting during the lunch. I had my first real experience with ArabWIC organization in GHC 2013. ArabWIC had more participation this year. I also attended ArabWIC reception, Sponsored by Qatar Computing Research Institute (QCRI),on Wednesday's night and get a chance to connect many Arab women in computing in business and academia.

After that I attended the "Data Science in Social Media Analysis Presentations", which included three presentations that talk about data analysis. The three useful presentations were:
"How to be a data scientist?" by Christina Zou
The presenters talked about real-life projects. The highlights of the presentations were:

  • "Improve the accuracy is what we strove for."
  • "It’s important to understand the problem."
  • "Divide the problem into pieces."
  • After the presentations, I talked to Christina about my research, and she gave me some ideas that I'll apply.
    The picture taken from GHC Facebook page
    At the end of the day, Friday celebration, which was sponsored by Google, Microsoft, GoDaddy, begins at 7:30. The dancing floor was full of amazing ladies celebrating and dancing with glowing sticks!

    It was fantastic meeting a large number of like-minded peers and future employers. I'm pleased to have this great opportunity which allowed me to network and communicate with many great women in computing. GHC allowed me to discuss my research ideas with many senior women and got positive feedback about it. I came back with multiple ideas that will help me shape my next phase of my research and my next career path.


    Tuesday, October 7, 2014

    2014-10-07: FluNet Visualization

    (Note: This wraps up the current series of posts about visualizations created either by students in our research group or in our classes. I'll post more after the Spring 2015 offering of the course.)

    I've been teaching the graduate Information Visualization course since Fall 2011.  In this series of posts, I'm highlighting a few of the projects from each course offering.  (Previous posts: Fall 2011, Fall 2012, 2013)

    The final visualization in this series is an interactive visualization of the World Health Organization's global influenza data, created by Ayush Khandelwal and Reid Rankin in the Fall 2013 InfoVis course. The visualization is currently available at and is best viewed in Chrome.

    The Global Influenza Surveillance and Response System (GISRS) has been in operation since 1995 and aggregates data weekly from laboratories and flu centers around the world. The FluNet website was constructed to provide access to this data, but does not include interactive visualizations.

    This project presents an interactive visualization of all of the GISRS data available through FluNet as of October 2013. The main visualization is an animated 3D choropleth globe where hue corresponds to virus lineage (influenza type A or type B) and color intensity corresponds to infection level. This shows the propagation of influenza across the globe over time.  The globe is also semi-transparent, so that the user can see how influenza infection rates change on the opposite hemisphere. The user may pick a specific time period or press the play button and watch the yearly cycle of infection play itself out on the globe's surface.

    The visualization also includes the option to show a 2D version of the globe, using the Natural Earth projection.

    There is a stacked area slider located under the globe for navigating through time (example of a "scented widget").  The stacked area chart provides a view of the progression of infection levels over time and is shown on a cubic-root scale to compensate for the peaks during the 2009 flu pandemic.

    If the user clicks on a country, a popout chart will be displayed, showing a single year of data for that country, centered on the current point in time.  The default view is a stacked area chart, but there are options to show either a streamgraph or an expanded 100% stacked area chart.  The popout chart animates with the choropleth.

    The video below shows a demo:

    Although the data was freely available from the GISRS website, there was still a significant amount of data cleaning involved.  Both OpenRefine and Mr. Data Converter were used to clean and format the data into JSON.  The D3.js, NVD3, and TopoJSON libraries were used to create the visualization.

    Our future work on this project involves turning this into an extensible framework that can be used to show other global datasets over time.


    Friday, October 3, 2014

    2014-10-03: Integrating the Live and Archived Web Viewing Experience with Mink

    The goal of the Memento project is to provide a tighter integration between the past and current web.    There are a number of clients now that provide this functionality, but they remain silent about the archived page until the user remembers to invoke them (e.g., by right-clicking on a link).

    We have created another approach based on persistently reminding the user just how well archived (or not) are the pages they visit.  The Chrome extension Mink (short for Minkowski Space) queries all the public web archives (via the Memento aggregator) in the background and will display the number of mementos (that is, the number of captures of the web page) available at the bottom right of the page.  Selecting the indicator allows quick access to the mementos through a dropdown.  Once in the archives, returning to the live web is as simple as clicking the "Back to Live Web" button.

    For the case where there are too many mementos to make navigating an extensive list useable (think captures), we have provided a "Miller Columns" interface that allows hierarchical navigation and is common in many operating systems (though most don't know it by name).

    For the opposite case where there are no mementos for a page, Mink provides a one-click interface to submit the page to Internet Archive or for immediate preservation and provides just-as-quick access to the archived page.

    Mink can be used concurrently with Memento for Chrome, which provides a different modality of letting the user specify desired Memento-Datetime as well as reading cues provided by the HTML pages themselves.  For those familiar with Memento terminology, Memento for Chrome operates on TimeGates and Mink operates on TimeMaps.  We also presented a poster about Mink at JCDL 2014 in London (proceedings, poster, video).

    Mink is for Chrome, free, publicly available (go ahead and try it now!), and open source (so you know there's no funny business going on).

    —Mat (@machawk1)