Thursday, December 24, 2015

2015-12-24: CNI Fall 2015 Membership Meeting Trip Report

The CNI Fall 2015 Membership Meeting was held in Washington, D.C., December 14-15, 2015.  Like all CNI meetings, the Fall 2015 meeting was excellent and contained many high quality presentations.  Unfortunately, the members' project briefings ran simultaneously, with 7 or 8 different presentations overlapping at any given time.  As a result I missed a great deal. 

Cliff Lynch kicked off the meeting with reflections about public access to federally funded research (e.g., CRS R42983), interoperability (e.g., OAI-ORE, ORCIDs, IIIF), linked data (e.g., Wikipedia notability guidelines for biographies),  privacy & surveillance (e.g., eavesdropping Barbies, Ashley Madison data breach, RFC 7624), and understanding the personalization algorithms that go into presenting (and thus archiving) the view of the web that you experience (e.g., our 2013 D-Lib Magazine article about mobile vs. desktop & GeoIP), and much more.  I'm hesitant to try to further summarize his talk -- watching the video of his talk, as always, is time well spent.

In the next session Herbert and I presented "Achieving Meaningful Interoperability for Web-based Scholarship", which is basically a summary of our recent D-Lib Magazine paper "Reminiscing About 15 Years of Interoperability Efforts". 

2016-01-07 Edit: CNI has now posted the video of our presentation:

See also the excellent summary and commentary from David Rosenthal about the "signposting" proposal.

The next session I split between "Linked Data for Libraries and Archives: LD4L and Europeana" (see the "Linked Data for Libraries" site) and "Is Gold Open Access Sustainable? Update from the UC Pay-It-Forward Project" (slides, video).  The final session of the day included several presentations I would have liked to have seen but didn't.  I understand "Documenting Ferguson: Building A Community Digital Repository" (slides) was good & standing room only.

I missed the opening session on the second day (including the "Update on Funding Opportunities" presentation), but made the presentation from David Rosenthal about emulation.  See the transcript of his talk, as well as his 2015 Emulation and Virtualization as Preservation Strategies report for the AMF.

Unfortunately, David's talk collided with that of Martin & his UCLA colleagues.  Fortunately, CNI has posted the video of their talk, his slides are online, and he has a great interactive site to explore the data.

After lunch I attend Rob's talk "The Future of Linked Data in Libraries: Assessing BibFrame Against Best Practices" (slides).  Rob even referenced my "no free kittens" slogan (tirade?) from our time developing OAI-ORE:

The closing plenary was an excellent talk from Julie Brill, head of the Federal Trade Commission, entitled "Transparency, Trust, and Consumer Protection in a Complex World".  The transcript is worth reading, but the essence of the talk explores the role the FTC would (should?) play in making sure that consumers can be aware of the data that companies track about them and how that data is used to make decisions about the consumers. (2016-01-07 edit: the video of her presentation is now online.)

A mostly complete list of slides is available via the OSF.  CNI recorded many of the presentations and have begun uploading the videos to the CNI Youtube channel.  The CNI Spring 2016 Membership Meeting will be held in San Antonio, TX, April 4-5, 2016.

Given all the simultaneous sessions, your CNI experience was probably different than mine.  Check out these other CNI Fall 2015 trip reports: Dale Askey, Jaap Geraerts, and Tim Pyatt


Tuesday, December 8, 2015

2015-12-08: Evaluating the Temporal Coherence of Composite Mementos

When an archived web page is viewed using the Wayback Machine, the archival datetime is easy to determine from the URI and the Wayback Machine's display.  The archival datetime of embedded resources (images, CSS, etc.) is another story.  And what stories their archival datetimes can tell.  These stories are the topic of my recent research and Hypertext 2015 publication.  This post introduces composite mementos, the evaluation of their temporal (in-)coherence, provides an overview of my research results.


What is a composite memento?


A Memento is an archived copy of web resource (RFC 7089)  The datetime when the copy was archived is called its Memento-Datetime.  A composite memento is a root resource such as an HTML web page and all of the embedded resources (images, CSS, etc.) required for a complete presentation.  Composite mementos can be thought of as a tree structure.  The root resource embeds other resources, which may themselves embed resources, etc.  The figure below shows this tree structure and a composite memento of the ODU Computer Science home page as archived by the Internet Archive on 2005-05-14 01:36:08 GMT.  Or does it?


Hints of Temporal Incoherence


Consider the following weather report that was captured 2004-12-09 19:09:26 GMT.  The Memento-Datetime can be found in the URI and the December 9, 2004 capture date is clearly visible near the upper right.  Look closely at description of Current Conditions and the radar image.  How can there be no clouds on the radar when the current conditions are light drizzle?  Something is wrong here.  We have encountered temporal incoherence.  This particular incoherence is caused by inherent delays of the capture process used by Heritrix and other crawler-based web archives.  In this case, the radar image was captured much later (9 months!) than the web page itself.  However, there is no indication of this condition.


A Framework for Evaluating Temporal Coherence

In order to study temporal coherence of composite mementos, a framework was needed.  The framework details a series of patterns describing the relationships between root and embedded mementos and four coherence states.  The four states and sample patterns are described below.  The technical report describing the framework is available on arXiv.


Prima Facie Coherent

An embedded memento is prima facie coherent when evidence shows that it existed in its archived state at the time the root was captured.  The figure below illustrates the most common case.  Here the embedded memento was captured after the root but modified before the root.  The importance of Last-Modified is discussed in my previous post on the importance of header replay.


Possibly Coherent

An embedded memento is possibly coherent when evidence shows that it might have existed in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured before the root.


Probably Violative

An embedded memento is probably violative when evidence shows that it might not have existed in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured after the root, but its Last-Modified datetime is unknown.

Prima Facie Violative

An embedded memento is probably violative when evidence shows that it did not exist in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured after the root and was also modified after the root.




Only One in Five Archived Web Pages Existed as Presented

Using the framework, we evaluated the temporal coherence of 82,425 composite mementos. These contained 1,623,127 embedded URIs, of which 1,332,993 were available in a web archive.  Composite mementos were recomposed using single and multiple archives and two heuristics: minimum distance and bracket.

Single and multiple archives: Composite mementos were recomposed from single and multiple archives. For single archives, all embedded mementos were selected from the same archive as the root. For multiple archives, embedded mementos were selected from any of the 15 archives included in the study.

Heuristics:  The minimum distance (or nearest) heuristic selects between multiple captures for the same URI by choosing the memento with the Memento-Datetime nearest to the root's Memento-Datetime, and can be either before or after the root's. The bracket heuristic also takes Last-Modified datetime into account. When a memento's Last-Modified datetime and Memento-Datetime "bracket" the root's Memento-Datetime (as in Prima Facie Coherent above), it is selected even if it is not the closest.

We found that only 38.7% of web pages are temporally coherent and that only 17.9% (roughly 1 in 5) of web pages are temporally coherent and can be fully recomposed (i.e., they have no missing resources).

The paper can be downloaded from the ACM Digital Library or from my local copy.  The slides from the Hypertext'15 talk follow.

One last thing: I would like to thank Ted Goranson for presenting the slides at Hypertext 2015 when we could not attend.

-- Scott G. Ainsworth

Friday, November 27, 2015

2015-11-28: Two WS-DL Classes Offered for Spring 2016

Two WS-DL classes are offered for Spring 2016:

Information Visualization is being offered both online (CRNs 29183 (HR), 29184 (VA), 29185 (US)) and on-campus (CRN 25511).  Web Science is being offered for the first time with the 432/532 numbers (CRNs 27556 and 27557, respectively), but the class will be similar to the Fall 2014 offering as 495/595


Tuesday, November 24, 2015

2015-11-24 Twitter Follower Analysis of Virginia University Alumni Associations

The primary goal of any alumni association is to maintain and strengthen the ties between its alumni, the community, and the mission of the university. With social media, it's easier than ever to connect with current and former graduates on Facebook, Instagram or Twitter with a simple invitation to "like us" or "follow me." Considering just one of these social platforms, we recently analyzed the Twitter networks of twenty-three (23) Virginia colleges and universities to determine what, if any, social characteristics were shared among the institutions and whether we could gain any insight by examining the public profiles of their respective followers. The colleges of interest, ranked by number of followers in Table 1, vary in size, mission, type of institution, admissions selectivity and perceived prestige. Each of the alumni associations has maintained a Twitter presence for an average of six (6) years. The oldest Twitter account belongs to Roanoke College (@roanokecollege) which is approaching the eight (8) year mark. The newest Twitter account was registered by Randolph Macon College (@RMCalums) nearly two years ago.

University Followers Joined Twitter
University of Virginia 12,100 11/1/2008
Roanoke College* 9,588 3/1/2008
Regent University* 7,966 11/1/2008
James Madison University 7,865 8/1/2008
Virginia Tech 6,418 4/1/2009
College of William & Mary 4,448 1/1/2009
University of Mary Washington 3,847 10/1/2009
Liberty University 3,699 11/6/2009
University of Richmond 3,299 5/1/2009
Sweet Briar College* 2,523 8/1/2010
George Mason University 2,375 2/1/2011
Hampton University 2,372 2/15/2012
Christopher Newport University 2,191 8/1/2010
Old Dominion University 1,996 7/1/2009
Randolph College* 1,857 8/1/2008
Washington and Lee University 1,842 8/1/2011
Radford University 1,758 3/11/2011
Hampden-Sydney College 1,086 7/1/2009
Longwood University 1,035 2/28/2013
Hollins University 923 4/1/2009
Virginia Military Institute 836 3/1/2009
Norfolk State University 629 8/15/2011
Randolph-Macon College 172 3/7/2014
Table 1 - Alumni Associations Ranked by Followers

* Institution does not have an official alumni Twitter account.
The university Twitter account was used instead.

Social Graph Analysis

NodeXL is a template for Microsoft Excel which makes network analysis easy and rather intuitive. We used this tool for data collection to import the Twitter networks and to analyze the various social media interactions. There are limitations established in the Twitter API which regulate the amount of data collected per hour by any one user. Therefore, due to rate limiting, NodeXL will inherently only import the 2,000 most recent friends and followers for any Twitter account. To improve the response time of the API, we further restricted our data collection to the 200 most recent tweets for both the university and each of its follower accounts.

For our first look at the alumni associations, we clustered the data based on an algorithm in NodeXL which looks at how the vertices are connected to one another. The clusters, as shown in Figure 1, are indicated by the color of the nodes. The clusters themselves revealed some interesting patterns.  The high level of inter-association connectivity, as measured in follows, tweets and mentions, was unexpected. We would have thought that each association operated within the confines of its own Twitter space or that of its parent organization. As we examine the groupings in this network, it is not unreasonable that we would observe connections between Old Dominion University (@ODUAlumni), Norfolk State University (@nsu_alumni_1935) and Hampton University (@HamptonU_Alumni) as all three are located within close proximity of one another in the Hampton Roads area. But, then we must take notice of Hollins University (@HollinsAlum), a small, private women's college in Roanoke, VA, which has a connection with ten (10) other alumni associations; more connections than any other school. Hollins is one of the smallest universities in our group with enrollment of less than 800 students. Since Twitter is primarily about influence, in this instance, we can probably assume the follows serve as a means to observe best practices and current engagement trends employed by larger institutions. While Hollins University is well connected, as are many of the other schools, at the opposite end of the spectrum we find Liberty University (@LibertyUAlum), a large school with more than 77,000 students. Liberty University remains totally isolated with no follower connections to the other alumni associations. You might minimally expect some type of connection with either Regent University (@RegentU) since both share a similar mission as private, Christian institutions or other universities within close physical proximity such as Randolph College (@randolphcollege).

Figure 1 - Connectivity of Alumni Associations

Twitter Followers, Enrollment, and Selectivity

We normally measure the popularity of a Twitter account based on the number of followers. Instead of simply quantifying the follower counts of each alumni association, we sought to understand if certain factors, actions or inherent qualities about the institution might influence the relative number of followers.  First, we considered whether more active tweeters would attract more alumni followers. As shown in Figure 2, the College of William and Mary (@wmalumni) has generated the most tweets over its lifetime, approximately 6,200 or 2.5 tweets per day. But, we also observe the University of Mary Washington (@UMaryWash), which has approximately half the student enrollment, a similar Twitter life span, 50% percent less tweets at 2,800 or 1.3 per day, with only a slight difference in the number of followers, 4,400 versus 3,800 respectively. While the graph shows that schools such as Virginia Tech (@vt_alumni) and the University of Virginia (@UVA_Alumni) have more followers with fewer lifetime tweets, the caveat is that these public institutions have the benefit of considerably larger student populations which inherently increases the pool of potential alumni.

Figure 2 - Lifetime Tweets Versus Followers

Next, we considered whether a higher graduation rate, or alumni production, would result in more followers. We obtained the most recent, 2014 overall graduation rates for each institution from the National Center for Education Statistics, with reported overall six-year graduation rates ranging from 34% to 94%. A 2015 Pew Research Center study of the Demographics of Social Media Users indicates that among all internet users, 32% in the 18 to 29 age range use Twitter. This is a key demographic as we would expect our alumni associations to be primarily focused on attracting recent undergraduates. We also factored in selectivity, a comparative scoring of the admissions process, using the categories defined in the 2016 U.S. News Best Colleges Directory. In this directory, colleges are designated as most selective, more selective, selective, less selective or least selective based on a formula.

As we look at Figure 3, we observe a positive correlation between admissions selectivity and the institution's overall graduation rate. Schools which were least selective during the admissions phase also produced the lowest graduation rates (less than 40%) while schools which were most selective, experienced the highest graduation rates (around 90%).  It isn't surprising that improved graduation rates positively affect the expected number of alumni Twitter followers. We'll leave it as an exercise for the reader to extrapolate how closely each institution's annual undergraduate enrollment, graduation rate and expected level of engagement on Twitter corresponds to the actual number of followers when all three factors are considered.

Figure 3 - Followers Versus Graduation Rate

Potential Reach of Verified Followers

Users on Twitter want to be followed so we looked carefully at who, besides alumni and students, was following each of the alumni associations. Specifically, we noted the number of Twitter verified followers; accounts which are usually associated with high-profile users in "music, acting, fashion, government, politics, religion, journalism, media, sports, business and other key interest areas." In addition to an abundance of local news reporters and sports anchors, regional politicians and career sites, other notable followers included: restaurant review site Zagat (@Zagat), automaker Toyota USA (@toyota), musician and rapper DJ King Assassin (@DjKingAssassin), the Nelson Mandela Foundation (@NelsonMandela), the President of the United States Barack Obama (@BarackObama), Virginia Governor Terry McAuliffe (@GovernorVA) and artist and singer Yoko Ono (@yokoono). It's a safe assumption that some of the follower relationships with verified users were probably established prior to 2013. This is the year in which Twitter instituted new rules to kill the "auto follow" which was a programmatic way of following another user back after they follow you. Either way, the open question would remain as to why these particular users would follow an alumni association when there are no readily apparent educational ties.

Twitter doesn't take follower count into consideration when verifying an account, but it's not unusual for a verified account to have a considerable following. Since the mission of an alumni association is essentially about networking and information dissemination, we also measured the potential reach or level of influence across the followers' extended network obtained from the verified accounts. No single university had more than 70 verified accounts among its followers. However, when we look at their contribution, in Figure 4, as a percentage of the combined reach achieved by all followers of each alumni association, these select users accounted for as little as 1.6% for Virginia Military Institute (@vmialumni) to as much as 95.8% for Longwood University (@acaptainforlife) of the institution's total potential reach (i.e., followers of my followers).

Figure 4 - Potential Reach Percentage of Verified Accounts

Alumni Sentiment

Finally, we examined how each follower described himself in the description (i.e., bio) portion of their Twitter profile by extracting the top 200 most frequently occurring terms for each alumni association. A word cloud for the alumni of each university is shown in Figure 5. If we further isolated the descriptions to the top ten most frequently occurring words, we observed a common pattern among all alumni followers. In addition to the official or some derivative of the institution name (e.g., JMU, NSU, Tech), we find the terms love, life, and some intimate description of the follower as a mom, husband, student, father or alumni.  If the university has an athletic department, we also found mention of sports and, in the case of our two Christian universities, Liberty and Regent, the terms God, Jesus, and Christ were prevalent. In 22 of 23 institutions, the alumni primarily described themselves using these personal terms. Conversely, the alumni followers at only one institution, the University of Richmond (@urspidernetwork), described themselves in a more business-like or academic manner with more frequent mention of the words PhD, career, and job.

Figure 5 - Word Clouds of Twitter Follower Descriptions

-- Corren McCoy

Thursday, November 5, 2015

2015-11-06: iPRES2015 Trip Report

From November 2nd through November 5th, Dr. Nelson, Dr. Weigle, and I attended the iPRES2015 conference at the University of North Carolina Chapel Hill. This served as a return visit for Drs. Nelson and Weigle; Dr. Nelson worked at UNC through a NASA fellowship and Dr. Weigle received her PhD from UNC. We also met with Martin Klein, a WS-DL alumnus now at the UCLA Library. While the last ODU contingent to visit UNC was not so lucky, we returned to Norfolk relatively unscathed.

Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern's keynote address delivered with Leo Konstantelos and Maureen Pennock. This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento utility. The responses are available on Google Docs.

I attended the Institutional Opportunities and Challenges session to open the conference. Kresimir Duretec presented "Benchmarks for Digital Preservation Tools." His presentation touched on how we can get digital preservation tools that "Just Work", including benchmarks for evaluating tools on test beds and measuring them for quality. Related to this is Mat Kelly's work on the Archival Acid Test.

Alex Thirifays presented "Towards a Common Approach for Access to Digital Archival Records in Europe." This paper touched on user access: user needs, best practices for identifying requirements for access, and a capability gaps analysis of current tools versus user needs.

"Developing a Highly Automated Web Archive System Based
on IIPC Open Source Software" was presented by Zhenxin Wu. Her paper outlined a framework of open source tools to archive the web using Heritrix and a SOLR index of WARCS with an enhanced interface.

Barbara Sierman closed the session with her presentation "Best Until ... A National Infrastructure for Digital Preservation in the Netherlands" focusing on user accessibility and organizational challenges as part of a national strategy for preserving digital and cultural Dutch heritage.

After lunch, I lead off the Infrastructure Opportunities and Challenges session with my paper on Archiving Deferred Representations Using a Two-Tiered Crawling Approach. We defined deferred representations as those that rely on JavaScript to load embedded resources on the client. We show that archives can use PhantomJS to create a 1.5 times larger crawl frontier than Heritrix itself, but PhantomJS crawls 10.5 times slower. We recommend using a classifier to recognize deferred representations and only use it to crawl deferred representations, mitigating the crawl slow-down while still reaping the benefits of the headless crawler.

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling Approach from Justin Brunelle
Douglas Thain followed with his presentation on "Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?" Similar to our work with deferred representations, his work focuses on scientific replay of simulations and software experiments. He presents several tools as part of a framework for preserving the context of simulations and simulation software, including dependencies and build information.

Hao Xu presented "A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System". His presentation focuses on a construction of a finite state machine to understand whether a repository is following compliance policies for auditing purposes.

Jessica Trelogan and Lauren Jackson presented their paper Preserving an Evolving Collection: "“On-The-Fly” Solutions for the Chora of Metaponto Publication Series." They discussed the storage of complex artifacts of ongoing research projects in archeology with the intent of improving sharability of the collections.

To wrap up Day 1, we attended a panel on Preserving Born-Digital News consisting of Edward McCain, Hannah Sommers, Christie Moffatt, Abigail Potter (moderator), Stéphane Reecht, and Martin Klein. Christie Moffatt identified the challenges with archiving born-digital news material, including the challenges with scoping a corpus. She presented their case study on the Ebola response. Stéphane Reecht presented the work by the BnF regarding their work to perform massive, once-a-year crawls as well as selective, targeted daily crawls. Hannah Sommers provided insight into the culture of a news producer (NPR) on digital preservation. Martin Klein presented SoLoGlo (social, local, and global) news preservation, including citing statistics about the preservation of links shortened by the LA Times. Finally, Edward McCain discussed the ephemeral nature of born-digital news media, and provided examples of the sparse number of mementos in news pages in the Wayback Machine.

To kick off Day 2, Lisa Nakamura gave her opening keynote The Digital Afterlives of This Bridge Called My Back: Public Feminism and Open Access. Her talk focused on the role of Tumblr in curating and sharing a book no longer in print as a way to open the dialogue on the role of piracy and curation in the "wild" to support open access and preservation.

I attended the Dimensions of Digital Preservation session, which began with Liz Lyon's presentation on "Applying Translational Principles to Data Science Curriculum Development." Her paper outlines a study to help revise the University of Pittsburgh's data science curriculum. Nora Mattern took over the presentation to discuss the expectations of the job market to identify the skills required to be a professional data scientist.

Elizabeth Yakel presented "Educational Records of Practice: Preservation and Access Concerns." Her presentation outlined the unique challenges with preserving, curating, and making available educational data. Education researchers or educators can use these resources to further their education, reuse materials, and teach the next generation of teachers.

Emily Maemura presented "A Survey of Organizational Assessment Frameworks in Digital Preservation." She presented the results of a survey focusing on frameworks for assessment models, drawing conclusions like software maturity models do for computer scientists. Further, her paper identifies trends, gaps, and models for assessment.

Matt Schultz, Katherine Skinner, and Aaron Trehub presented "Getting to the Bottom Line: 20 Digital Preservation Cost Questions." Their questions help institutions evaluate cost, including questions about storage fees, support, business plans, etc. to help institutions assess their approach to taking on digital preservation.

After lunch, I attended the panel on Long Term Preservation Strategies & Architecture: Views from Implementers consisting of Mary Molinaro (moderator), Katherine Skinner, Sibyl Schaefer, Dave Pcolar, and Sam Meister. Sibyl Schaefer lead off with a presentation of details on Chronopolis and ACE audit manager. Dave Pcolar followed by presenting the Digital Preservation Network (DPN) and their data replication policies for dark archives. Sam Meister discussed the BitCurator Consortium which helps with the acquisition, appraisal, arrangement and descriptions, and access of archived material. Finally, Katherine Skinner presented the MetaArchive Cooperative and their activities teaching institutions to perform their own archiving, along with other statistics (e.g., the minimum number of copies to keep stuff safe is 5).

Day 2 concluded with the poster session (including a poster by Martin Klein) and reception.

Pam Samuelson opened Day 3 with her keynote Mass Digitization of Cultural Heritage: Can Copyright Obstacles Be Overcome? Her keynote touched on the challenges with preserving cultural heritage introduced by copyright, along with some of the emerging techniques to overcome the challenges. She identified duration of copyright as a major contributor to the challenges of cultural preservation. She notes that most countries have exceptions for libraries and archives for preservation purposes, and explains recent U.S. evolutions in fair use through the Google Books rulings.

After Samuelson's keynote, I concluded my iPRES2015 visit and explored Chapel Hill, including a visit to the Old Well (at the top of this post) and an impromptu demo of the pit simulation. It was very scary.

Several themes emerged from iPRES2015, including an increased emphasis on web archiving and a need to improved context, provenance, and access for digitally preserved resources. I look forward to monitoring the progress in these areas.

--Justin F. Brunelle

Tuesday, October 27, 2015

2015-10-21: Grace Hopper Celebration of Women in Computing (GHC) 2015

On October 13-17, the atmosphere at the George R. Brown Convention Center in Houston, Texas was electric with 12,000 women in tech from all around the world attending the Grace Hopper Celebration of Women in Computing (GHC), the world's largest gathering for women in computing. GHC is presented by the Anita Borg Institute (ABI) for Women and Technology, which was founded by Dr. Anita Borg and Dr. Telle Whitney in 1994 to bring together research and career interests of women in computing and encourage the participation of women in computing. The incredible progress of GHC went from 500 women in technology at 1994 to 12,000 women this year.

I was humbled to receive a scholarship from the ABI to attend GHC 2015. I also was thrilled twice before to attend the GHC 2013 in Minnesota and GHC 2014 in Phoenix. This year, I represented the Computer Science department at Old Dominion University, the ArabWIC organization, as a member of the leadership committee and as a mentor in the academic mentoring sessions, and the ABI organization, in which I volunteered for blogging and taking notes from GHC. You can visit the Grace Hopper Celebration 2015 wiki page for reading more about the sessions note updates.

The conference was filled with exciting lineup of inspiring speakers, panels, sessions and workshops. There were multiple technical tracks: career, emerging tech, general sessions, open source, organizational transformation, and technology (e.g., data science, artificial intelligence, HCI, security, software engineering). Conference presenters represented many different fields, such as academia, industry, and government. The non-profit organization "Computing Research Association Committee on Women in Computing (CRA-W)", also offered sessions targeted towards academics and business. I had a chance to attend Graduate Cohort Workshop in 2013, which was held in Boston, MA, and created a blog post about it.

The first day was kicked off by the amazing and inspiring Telle Whitney, the president and the CEO of the ABI, welcoming the audience. Whitney gave the audience a piece of advice: "talk to almost anyone you pass by in the conference and introduce yourself. It is your time to learn, to join new communities, to reach out people, and offer advice. It is our time to lead". She introduced the featured keynote speakers of the three days of the conference: Susan Wojcicki (the CEO of YouTube), Megan Smith (the first female CTO of the United States), and Sheryl Sandberg (the CEO of Facebook), Manuela M. Veloso (Professor in the Computer Science Department at Carnegie Mellon University), Clara Shih (CEO and Founder of Hearsay Social), Hilary Mason (the Founder of Fast Forward Labs). At the end, Whitney introduced Alex Wolf, the President of the Association of Computing Machinery (ACM) and a professor in the Department of Computing at Imperial College London, UK, for opening remarks.

As the day progressed, the Open Source Day sessions and presentations were talking place. Open Source Day: Code-a-thon for Humanity gives women from around the world the chance to learn how to contribute to the open source community, regardless of their skill or experience level through developing a variety of humanitarian projects. The Open Source Day 2015 page contains more details about the projects.

The Wednesday Keynote by Hilary Mason: "This is the best room in the world !!", this is how Mason started her keynote, which was about machine intelligence research. Mason introduced herself as a data scientist, CEO, software engineer and followed up with "I look like all of those". She talked about the importance of data and mentioned that data products are everywhere. She mentioned many example for different apps that use machine intelligence research: Foursquare, an app from New York city company collect data and based on this data, the app provides recommendations of the places to go around a user's current location and Dark Sky app, which predicts when it will rain or snow. Dark Sky app was built on the top of government weather data. It may be not interesting for a Californian, but it is interesting for the rest of people :-).

Mason talked about how she become passionate about data science. She defined a data scientist as a professional role to combine multiple capabilities: math, statistic, coding ability to build infrastructure, and communication domain knowledge, everything they need to know to go to talk someone who has a problem. A data scientist works on analytical problems. She said technology is changing rapidly, and people's adaption of technology is growing faster. One of the interesting parts of her talk was about predicting the future. She said, "predicting the future is hard", then showed a picture for people from the past imagining the future.

At the second part of the talk, she talked about her company, Fast Forward labs, which started in 2014, to introduce a new method for applied research. They focus on innovation opportunities through data and algorithms. FF sits in the middle of three communities: established companies, startups, and academic research. What makes a machine intelligence technology interesting?
  1. A theoretical breakthrough 
  2. A change of economics 
  3. A capability becomes a commodity (ex: Hadoop
  4. a) Wikipedia: new data is available b) data is made useful 
Mason ended her talk with thanking everyone who helped her, then she gave the audience a piece of advice: "If you are at the beginning of your career and you are thinking of where you might end up, you need to know that my first GHC was in 2002, and I was a shy quiet student who mostly sit in the back in every talk and shy to ask a question. But it is amazing to be in this room today with so many people who have affected my career".

At the end of the keynote, the 2015 Technical Leadership ABIE Award was given to Lydia E. Kavraki, the Noah Harding Professor of Computer Science and professor of Bioengineering at Rice University.

After the keynote, I attended the "CRA-W Early Career: The Tenure Process" session by Julia Hirschberg from Columbia University and Joan Francioni from Winona State University. The session and tenure process, i.e., research, teaching, service, expectations of department, annual reviews, letter writers, and the typical process. The speakers gave advice and tips on understanding the requirements/expectations of your institution, such as, have an overall teaching plan/goals, do not be hard or too easy. They also gave tips regarding collaboration: the successful collaboration is a multiplier; you can achieve more than you can on your own and the unsuccessful collaboration can be a negative multiplier; waste times, stressful, creates hard feelings.

The panel of Global Women Technical Leaders Program 
Next, I attended the "Global Women Technical Leaders Program: After the Grace Hopper Celebration: Building and Sustaining Community" panel in the career track. The panelist were Josephine Ndambuki of Safaricom ltd Rosario Robinson of Anita Borg Institute, Alaa Fatayer of JawwalSana Odeh of NYU and ArabWIC and moderated by Arezoo Miot of TechWomen. Sana introduced the panel and thanked ABI and the panelists for their support and for increasing the women in tech communities. Rosario talked about her journey. She said she was the only woman in mathematics. The panel discussed the essentials of building a community to support women in technology. Alaa talked about her experience starting with tech women to building a community in Palestine. Some of the addressed questions were: How and where do you start in creating a community? What programs are out there to support technical women? Overcoming obstacles in creating local communities. How can we develop allies in our communities? At the end, Rosario gave the following advice: "be clear about what you are and what you do".

A panel by directors from Apple in the scholars lunch 
The scholars lunch: Suzanne Mathew, an Assistant Professor of Computer Science at the United States Military Academy, introduced the panel of three amazing ladies from Apple. The panel was by Esther Hare, a Director in Worldwide Developer Marketing team and Maryam Najafi, a Director of UX, and moderated by Karen Sipprell, the VP Marcom at Portal Software. The scholar lunch was sponsored by Apple, in which many scholars get together with discussions on tables and each table has one women who has a role in Apple. The number of scholars in GHC15 are 500 out of 2,000 applicants as Professor Nancy Amato from Texas A&M University announced. The panel by three seniors ladies from Apple handled many interesting experience by each one of them. Here are some advice from the panelist:
  • Be around as much as you can, the more you get around the more opportunities you will find. 
  • Find you passion, so you can solve problems. 
  • Go out and solve problems that freaks you out. 
The lunch ended with the fun part, seven Apple watches for seven lucky women who found animal stickers on the bottom of their chairs!

After lunch I had to work on some stuff for the ABI blogging and social media activities. I also communicated with many amazing women during the conference.

The Wednesday Afternoon Plenary: We had three TED style talks on "Transforming the Culture of Tech" by Clara Shih, the CEO and Founder of Hearsay Social, Blake Irving, the CEO and Board Director of GoDaddy, and the amazing Megan Smith, the Chief Technology Officer (CTO) of the United States of America.

The afternoon plenary speakers
Clara Shich mentioned that she attended GHC for the first time in 2004 when there were 800 attendees at GHC. She told the audience about her journey in the past decade, starting from a student to software engineer, then project manager, to being CEO and Founder of Hearsay Social. Shich shared with the audience the lessons she learned through her journey: 1) Listen carefully, 2) Be ok with being different 3) Cherish relationship above all and help other women. 4) There is no failure, only learning 5) The future is on us, because if not us then who? when, if not now? "if people just sat back 11 year ago, GHC would not be 12,000 today!". Every time we decide to lift a woman up we lift all women up.

Blake Irving talked about how he closed the gender gap at the company since he took over as CEO two years ago and mentioned the solid progress in the ratio of women in GoDaddy. Since last year's GHC, GoDaddy has more than doubled the number of women interns and graduate hires. Blake talked about payment equality and showed many graphs based on data of GoDaddy. "If you are a leader of tech company, be vulnerable again and again. Do not hide your problems. Go public with your diversity statistics, publish your salary. Seek change from the top and bottom. Do the research, find your issues. Surround yourself with people that will challenge you," Blake said, "bad things live in the dark, bad things die in the light."

Megan Smith with the President tech team showing
 the Declaration of Sentiments
"It is great to be back to my people!", this is how the amazing Megan Smith started her talk. Before mentioning the highlight of Megan Smith talk, I would like to highlight her amazing job during the conference to encourage and inspire the attendees by talking to them by herself. This lovely inspiring woman passed by the community booths at the career fair and allow people to talk to her personally and take pictures with her. She also was creative in showing some of the federal tech projects nowadays and bringing many ladies in tech from the president team. At the beginning, Smith talked about her new a role as a CTO of the USA, in which she serves as assistant to the President through advising him and his team on how technology policy, data and innovation can advance our future as a nation. She described the people in the federal government as so passionate, mission driven, and extraordinary.

GHC archive that was found in the previous Thanksgiving 
Smith mentioned that they found GHC archives in the previous Thanksgiving. She talked about many projects they are working on, such as, Innovation Nation, Active STEM Learning, Police Data Initiative. She described the President as "an incredible leader, so smart, so technical, science tech president, and he opens the doors for us to innovate”. Smith introduced many amazing young ladies from the president tech team, who talked about their different roles to serve the nation.

At the end, Smith talked about Declaration of Sentiments, a document signed by 103 of people in 1848 (68 women and 32 men) at the first women's rights convention to be organized by women in Seneca Falls, New York. The document is missing and they are looking for it with many archivist using the #FindTheSentiments.

There was a short discussion at the end with the three speakers about why changing is hard and what strategies are working for them.

In the meantime, the career fair, in which many famous companies, such as Google, Thomson Reuters, Facebook, Microsoft, IBM, etc., were there for hiring talented woman in tech as much as they can, and the community fair, which is a dedicated with in the Expo for attendees to interact with GHC communities, such as the BlackWIC and ArabWIC. The ABI booth was at the center of the Community Fair, where I met the amazing Telle Whitney and talked to her many times. The career fair was the place for anyone who wants to apply for job opportunities at all levels across industry and academia. Each company in the career fair has many representatives to discuss the different opportunities they have for women. A few men also attended the conference. The companies were very creative in advertising themselves.

Megan Smith at ArabWIC booth in the community fair 

The amazing Megan Smith passed by ABI community booths and stopped by the ArabWIC booth. We had a great chance to talk to her personally and take a look at the Declaration of Sentiments closely. She left us with encouragement and inspiration for leading communities and attract more women in tech!

At the end of the first day, I attended the ArabWIC reception, which was sponsored by the Qatar Computing Research Institute (QCRI). We had many new Arab ladies in computing and non-Arab women as well. We exchanged our bios and how each one of us is contributing to serve the women in technology.
The Thursday Keynote had two speakers: Susan Wojcicki, the CEO of YouTube and Hadi Partovi, the CEO and Cofounder at "I’m feeling that I’m really the talking guy in the room," Hadi Partovi said in the beginning of his talk. He shared with the audience his personal story that changed his life; when his dad brought a computer that did not have any games on it, and a book for Hadi to learn so he could write his own games. He talked about Hour of Code, a non-profit bootstrapped project that started in 2013 to expand access to computer science in schools. has support from both Democrats, Republicans, and many celebrities (e.g., President Obama, Bill Gates, Mark Zuckerberg). has trained 15,000 teachers to teach computer science this year, reaching 600K students (43% female)

The Hour of Code
Partovi insisted that his main goal for is not teaching kids how to code, it is teaching kids computer science. He claimed that CS education is on the recovery after many years of declines and there is a problem in CS. He also mentioned that about 9 out of 10 parents want their children to learn CS. I started already with my 7 years old and he was so excited to start his first code :-). Partovi claimed that the gender gap started at K-12; "Almost 70% of the high school kids do not have access to the computer science field. When kids go to school every kid learn about how electricity works or the basic math equations. In the 21th century, it is equally foundational to learn how algorithm work or how the internet works”. Partovi continued that “the school system can evolve to tech kids computer science field. Over 70 schools have embraced CS, including NY, Houston, Chicago, etc". Regarding to the diversity, Partovi asked if we can change the stereotype without changing the facts on the ground. He commented that the way to change the stereotyping is the Hour of Code, which has now 300 partners from 196 countries and 150,000 teachers. At the end, Partovi asked all the audience to help to get more volunteers. To encourage the people to get involved, Microsoft and Amazon will give away gift cards to any teacher who will organize Hour of Code.

After Partovi's talk, 2015 Grace Hopper Celebration Change Agent ABIE Award Winners, Maria Celeste Medina from Kenya and Mai Abualkas Temraz from Palestine were announced. The Award winners gave short inspiring talks about their journey to lead women in technology and how they started.
Susan Wojcicki described the conference as a lifeline where women come together, learn, feel supported, be a computer scientists, and be ourselves. She started her speech with a story about her girl who told her she hated computers, although she used to go to Google since she was born. Susan talked about the serious impact of leaving the girls out of conversation when it comes to technology. "Girls think that technology is insular and anti-social. By 2020, jobs in computer science are expected to grow nearly two times faster than the national average, totaling nearly 5 million jobs. Technology is revolutionizing almost every part in our lives. Every car today has more computing technology than Apollo 11 that first landed on the moon. Yet, today women hold only 26% of all tech jobs. The fact that women represent small portion of tech work force is not just a wake up call, it is a 'Sputnik 'moment. It risks future competitiveness,” Susan said "If women don't participate in tech, with its massive prominence in our lives and society, we risk losing many of the economic, political and social gains we have made over decades." Susan continued that the female representation in Tech is a problem and it is getting worse. The women in tech representation was better in the 80s. Susan Wojcicki shared an exclusive teaser of the Codegirl movie, directed by Lesley Chilcott, the Oscar winning film producer.

She talked about balance between family and work. She had her baby 5 months after she joined Google. The constraints of family (for example, how it is tough for kids to be the last one who are picked up from day care) enabled her to develop a work style that focus on efficiency, productivity, prioritization, and to do that at the office hours. She mentioned a Harvard study that shows that employees who take breaks from work have higher level of focus compared to those who do not. Furthermore, employees who feel encouragement by their bosses to take breaks are 100% more loyal to their employers.

Susan Wojcicki is the first one to take maternity leave in Google, and she the only person to take five maternity leaves at Google. Interestingly, each leave enriched her life and left her with peace of mind and gave her a chance to reflect on her career. A generous maternity leave increases retention. When women are given short maternity leave and they are under the pressure of having a call, they quit. When Google increased its paid family leave from 12 to 18 weeks, the rate at which mothers quit fell by 50%. 88% of women in USA are not given family leave. Susan said, "men don't get asked how they balance it all". Susan's daughter now loves computer science. She enrolled her in a computer camp that are for girls, afterward she sketched a computer watch that has her friends contacts and info, before Samsung and Apple came up with their watches.

At the end, Susan insisted that we have to make it our personal responsibility to show the next generation of girls that they belong to the world of computer science.

Advice from Susan:
  • We need to give everyone a chance to understand computer science. 
  • Make computer science available to everyone in the USA by making it mandatory. 
  • Focus on working smart. Work smart, work hard. Do a great job, but then GO home.
  • Keep asking, look out for yourself, be an advocate and do not feel guilty about it! 
  • For tech companies, you need to help employees to find balance between work and family.
  • Tech companies need to pay generous maternity leave. 
  • A step back helps sometime.
  • If you work for a company and you feel you can not work a balanced day and the maternity leave is bad, I recommend that you leave and search for a supportive company and by the way, we are hiring! 

The Thursday Afternoon Plenary: Thursday Afternoon Plenary was a conversation between Sheryl Sandberg, Facebook CEO and and author of best-selling book Lean In and Nora Danzel, Board Director of Ericsson, AMD, and Outerwall (makers of Redbox, Coinstar and ecoATM) about "What it means to be an effective leader and why it is so important to have women at the table to create technology". Sheryl shared her story about being a keynote speaker in GHC. The conversation handled gender diversity in technology and the pay gap. Sandberg asked the audience to negotiate regarding to payment equality. She talked about Lean In book and Lean In circles and how mentoring is important. She advised the audience to join Lean In circles. Sandberg said, "Starting a Lean In circle is a great leadership opportunity". To read more about the conversation, here is a nice article:
Sandberg: Tech offers the best jobs, needs more women voices, and women need to stick with it

I attended the "Change Agent and Social Impact Awards” session by the ABI award winners: Michal Segalov of Mind the Gap, Maria Celeste Medina of Ada IT, Daniel Raijman of Mind the Gap, Mai Abualkas Temraz of Gaza Sky Geeks.

The moderator had a conversation with the ABI award winners to draw out their stories. The winners talked about the turning points in their life and what continues to motivate them to make a difference. The moderator asked the panelists about the challenges they faced, the turning points in life, and what motivates them.
Daniel said they started Mind The Gap 8 years ago to expose many girls to computer science. They have interacted with 10,000 girls. Mind The Gap expanded globally and is now in its 8th year, with more than 10,000 participants to date.

Michal said that they cared the most about making Mind The Gap scalable. Mind The Gap offers the people to choose how to give/volunteer. For example, some people can provide tech classes, some other can give talk, etc. They had about 100 people volunteered and each volunteer only give one hour of their time per month, so that makes it easy for the people and encourage them to volunteer. Michel advice was to be open to changing things, yourself, and your passion.

María mentioned that her mom encouraged her and support her the most. In one year, Maria has worked with the Programá Tu Futuro team and has initiated more than 6,000 people in coding: kids, adults, teenagers and senior citizens (of which 30% are women). She said that there is also of studies to how to empower woman.

Mai from Gaza was talking over Skype because she could not attend for political reasons. Mai was asked for some fun facts, but she said that she is not in a good status because she could not make it the conference, which made it hard to mention fun facts. In 2014, she became a TechWomen Emerging Leader. She also encouraged everyone to help and support them, and also keep inviting them, so may be in one day they will be able to attend. Mai said they face a lot of challenges in Gaza, but she like to call them opportunities to learn and get more powerful in solving problems they face. At the end, Mai said, I’m kept motivated by events like this where I’m exposed to the global women’s tech community. My goal here is to bring back as much of your energy as I can to Gaza. You can come mentor in Gaza. She mentioned many examples for people who went to Gaza before for mentoring: Angie Chang, the founder of Women 2.0, Dave McClure, the Founder of 500 Startups, and many others. "Don’t worry, it’s safe," Mai Said "or you can mentor women in Gaza remotely." Mai is a member of ArabWIC as well.

Thursday speed mentoring sessions took place during the lunch table on Thursday and Friday. I joined mentoring discussions around academic careers. It was useful to hear from many senior women in academia about their career journey and also hear some questions about applying in academia.

At the career fair, I was lucky to meet Sinead Borgersen, a Principal HR Business Partner at CA Technologies and Dr. Michele Weigle's friend. We had a quick discussion about the careers in CA Technologies and how they will fit with my interest. Siena is an amazing lady who is full of enthusiasm.

The Friday Keynote: Friday morning started with a cool technical keynote on "Robotics as a Part of Society" by Manuela Veloso, Herbert A. Simon University Professor, Computer Science Department, Carnegie Mellon University. Manuela has become well-known in the AI community for being the guiding force behind robot soccer. In her keynote, Manuela highlighted different perspectives of robots in collaborative network of robots and humans. Manuela talked about CoBots, the robots she and her students created to help them with simple tasks in their offices and labs. There robots can use the internet or send emails to ask for help. She showed that autonomous robots learn from interacting with humans. "Technology is about diversity, "Manuela said. "You don’t have to do everything, but some do things that others can’t."

At the end of the keynote, there were announcements about the Grace Hopper 2016. The GHC 2016 will take place in Houston, Texas. The general program co-chairs for GHC 2016 will be Kaoutar El Maghraoui, from IBM Research and the ArabWIC and Maria Gini from University of Minnesota. I spent most of the time on Friday at the career fair, then I attended the mentoring session on ArabWIC lunch table and met many women in computing from different fields.

The Friday Afternoon Plenary: The day wrapped up with an afternoon plenary session focused on the importance on diversity in technology by Janet George, Chief Data Scientist for Big Data/Data Science and Cognitive Computing at SanDisk, Isis Anchalee, Platform Engineer at OneLogin, Miral Kotb, Director, Producer, Choreographer and Playwright for iLuminate.

I couldn’t attend the afternoon keynote, but I heard from many friends about iLuminate, which is a wearable lighting system that enables novel dance act, performance, in which the audiences were treated with at the end of the conference. For more about the afternoon plenary, here are nice wrap ups for the three talks:
GHC 2015 ended with busting a move on the dance floor in a night to remember at the Minute Maid Park. There were many photos booths, t-shirts, glowing sticks, and dessert. It is a Grace Hopper Celebration, after all!

It was fascinating to be in GHC 2015 to hear from the most talented and inspiring women in technology and get advice from them. Furthermore, spending the best time with many awesome ladies and get back with many friends who support each other. I also was glad to be involved in many activities this year for the ABI community and the ArabWIC.