Thursday, November 16, 2017

2017-11-16: Paper Summary for Routing Memento Requests Using Binary Classifiers

While researching my dissertation topic, I re-encountered the paper, "Routing Memento Requests Using Binary Classifiers" by Bornand, Balakireva, and Van de Sompel from JCDL 2016 (arXiv:1606.09136v1). The high-level gist of this paper is that by using two corpora of URI-Rs consisting of requests to their Memento aggregator (one for training, the other for training evaluation), the authors were able to significantly mitigate wasted requests to archives that contained no mementos for a requested URI-R.

For each of the 17 Web archives included in the experiment, with the exception of the Internet Archive on the assumption that a positive result would always be returned, a classifier was generated. The classifiers informed the decision of, given a URI-R, whether the respective Web archive should be queried.

Optimization of this sort has been performed before. For example, AlSum et al. from TPDL 2013 (trip report, IJDL 2014, and arXiv) created profiles for 12 Web archives based on TLD and showed that it is possible to obtain a complete TimeMap for 84% of the URI-Rs requested using only the top 3 archives. In two separate papers from TPDL 2015 (trip report) then TPDL 2016 (trip report), Alam et al. (2015, 2016) described making routing decisions when you have the archive's CDX information and when you have to use the archive's query interface to expose its holdings (respectively) to optimize queries.

The training data set was based off of the LANL Memento Aggregator cache from September 2015 containing over 1.2 million URI-Rs. The authors used Receiver Operating Characteristic (ROC) curves comparing the rate of false positives (URI-R should not have been included but was) to the rate of true positives (URI-R was rightfully included in the classification). When requesting a prediction from the classifier once training, a pair of each of these rates is chosen corresponding to the most the most acceptable compromise for the application.

A sample ROC curve (from the paper) to visualize memento requests to an archive.

Classification of this sort required feature selection, of which the authors used character length of the URI-R and the count of special characters as well as the Public Suffix List domain as a feature (cf. AlSum et al.'s use of TLD as a primary feature). The rationale in choosing PSL over TLD was because of most archiving covering the same popular TLDs. An additional token feature was used by parsing the URI-R, removing delimiters to form tokens, and transforming the tokens to lowercase.

The authors used four different methods to evaluating the ranking of the features being explored for the classifiers: frequency over the training set, sum of the differences between feature frequencies for a URI-R and the aforementioned method, Entropy as defined by Hastie et al. (2009), and the Gini impurity (see Breiman et al. 1984). Each metric was evaluated to determine how it affected the prediction by training a binary classifier using the logistic regression algorithm.

The paper includes applications of the above plots for each of the four feature selection strategies. Following the training, they evaluated the performance of each algorithm, with a preference toward low computation load and memory usage, for classification using correspond sets of selected features. The algorithms evaluated were logistical regression (as used before, Multinomial Bayes, Random Forest, and SVM. Aside from Random Forest, the other three algorithms had similar runtime predictions, so were evaluated further.

A classifier was trained using each permutation of the three remaining algorithms and each archive. To determine the true positive threshold, they brought in the second data set consisting of 100,000 unrelated URI-Rs from the Internet Archive's log files from early 2012. Of the three algorithms, they found that logistic regression performed the best for 10 archives and Multinomial Bayes for 6 others (per above, IA was excluded).

The authors then evaluated the trained classifiers using yet another dataset of URI-Rs from 200,000 randomly selected requests (cleaned to just over 187,000) from Given the data set was based on inter-archive requests, it was more representative of that of an aggregator's requests compared to the IA dataset. They computed recall, computational cost, and response time using a simulated method to prevent the need for thousands of requests. These results confirmed that the currently used heuristic of querying all archives has the best recall (results are comprehensive) but response time could be drastically reduced using a classifier. With a reduction in recall of 0.153, less than 4 requests instead of 17 would reduce the response time from just over 3.7 seconds to about 2.2 seconds. Additional details of optimization obtained through evaluation of the true positive rate can be had in the paper.

Take Away

I found this paper to be an interesting an informative read on a very niche topic that is hyper-relevant to my dissertation topic. I foresee a potential chance to optimize archival query from other Memento aggregators like MemGator and look forward to further studies is this realm on both optimization and caching.

Mat (@machawk1)

Nicolas J. Bornand, Lyudmila Balakireva, and Herbert Van de Sompel. "Routing Memento Requests Using Binary Classifiers," In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries (JCDL), pp. 63-72, (Also at arXiv:1606.09136).

Monday, November 6, 2017

2017-11-06: Association for Information Science and Technology (ASIS&T) Annual Meeting 2017

The crowds descended upon Arlington, Virginia for the 80th annual meeting of the Association for Information Science and Technology. I attended this meeting to learn more about ASIS&T, including its special interest groups. Also attending with me was former ODU Computer Science student and current Los Alamos National Laboratory librarian Valentina Neblitt-Jones.
The ASIS&T team had organized a wonderful collection of panels, papers, and other activities for us to engage in.

Plenary Speakers

Richard Marks: Head of the PlayStation Magic Lab at Sony Interactive Entertainment

Richard Marks talked about the importance of play to the human experience. He covered innovations at the Playstation Magic Lab in an effort to highlight possible futures of human-computer interaction. The goal of the laboratory is "experience engineering" whereby the developers focus on improving the experience of game play rather than on more traditional software development. Play is about interaction and the Magic Lab focuses on amplifying that interaction.

One of the new frontiers of gaming is virtual reality, whereby users are immersed in a virtual world. Marks talked about how using an avatar in a game intiates a "virtual transfer of identity". Consider the example of pouring water: seeing onesself pour water on a screen while using a controller provides one level of immersion, but seeing the virtual glass of water in your hands makes the action far more natural. He mentioned that VR players confronted with a virtual tightrope suspended above New York City had difficulty stepping onto the tightrope, even though they knew it was just a game.

He talked about thresholds of technology change, detailing the changes in calculating machines throughout the 20th Century and how "when you can get it into your shirt pocket, now everything changes". Though this calculator example seems an obvious direction of technology, it was not entirely obvious when calculating machines were first being developed. The same parallel can be made for user interfaces. Marks also mentioned that games allow their researchers to explore many different techniques without having to worry about the potential for loss of life or other challenges that confront interface researchers in other industries.

William Powers: Laboratory for Social Machines at the MIT Media Lab

William Powers, author of "Hamlet's Blackberry" and reporter at the Washington Post, gave a thoughtful talk on the effects of information overload on society. To him, tech revolutions are all about depth and meaning. Depth is about focus, reflection, and when "the human mind takes its most valuable and most important journeys". Meaning is our ability to develop "theories about what exists is all about".

He talked about the current social changes people are experiencing in the online (and offline) world. He personally found that he was not able to give attention to things he cared about. The more time he spent online, the harder it became to read longer pieces of work, like books. A number of media stories exist about diminishing attention spans correlated to an increase in online use.

While at a fellowship at Harvard's Shorenstein Center, Powers began work on what print on paper had done for civilization. He covered different "Philosophers of Screens" from history. Socrates believed that the alphabet would destroy our minds, fearing that people would not think outside of the words on the page. Socrates felt that people needed distance to truly digest the world around them. Seneca lived in a world of many new technologies, such as postal systems and paved roads, but he feared the "restless energy" that haunted him, developing mental exercises to focus the mind. By inventing the printing press, Gutenberg helped mass produce the written word, leading some of his era to fear the end of civilization as misinformation was being printed. In Shakespeare's time, people complained that the print revolution had given them too much to read and that they would not be able to keep up with it. Benjamin Franklin worked to overcome his own addictions through the use of ritual. Henry David Thoreau bemoaned the distracted nature of his compatriots in the 19th Century, noting that "when our life ceases to be inward and private, conversation degenerates to gossip." Marshall McLuhan also believed that we could rise above information overload by developing our own strategies.

The output of this journey became the paper "Hamlet's Blackberry: Why Paper Is Eternal", which then led to the book &quotHamlet's Blackberry". The common thread was that each age has had new technical advances and concerns that people were becoming less focused and more out of touch. Each age also had visionaries who found that they could rise above this information fray by developing their own techniques for focus and introspection. Every technical revolution starts with the idea that the technology will consume everything, but this is hardly the case. Says Powers, "If all goes well with the digital revolution, then tech will allow us to have the depth that paper has given us." Powers even mentioned that he had been discussing with Tim Berners-Lee how to build a "better virtual society in the virtual world" that would in turn improve our real world.

Sample of Papers Presented

As usual, I cannot cover all papers presented, and, due to overlaps, was not present at all sessions. I will discuss a subset of the presentations that I attended.

Top Ranked Papers

Eric Forcier presented something near to one of my topics of interest in "Re(a)d Wedding: A Case Study Exploring Everyday Information Behaviors of the Transmedia Fan". In the paper he talks about the phenomena of transmedia fandom: fans who explore a fictional world through many different media types. The paper specifically focuses on an event in the Game of Thrones media franchise: The Red Wedding. Game of Thrones is an HBO television show based on a series of books named A Song of Ice and Fire. This story event is of interest because book fans were aware of the events of the Red Wedding before television fans experienced them, leading to a variety of different experiences for both. Forcier details the different types of fans and how they interact. Forcier's work has some connection to my work on spoilers and using web archives to avoid them.

In "Before Information Literacy [Or, Who Am I , As a Subject-Of-(Information)-Need?]", Ronald Day of the Department of Information and Library Science at Indiana University discusses the current issue of fake news. In his paper he considers the current solutions of misinformation exposure to be incomplete. Even though we are focusing on developing better algorithms for detecting fake news and also attempting to improve information literacy, there is also the possibility of improving a person's ability to determine what they want out of an information source. Day's paper provides an interesting history of information retrieval from an information science perspective. Over the years, I have heard that "data becomes information, but not all data is information"; Day extends this further by stating that "knowledge may result in information, but information doesn't necessarily have to come from or result in knowledge".

In "Affordances and Constraints in the Online Identity Work of LGBTQ+ Individuals", Vanessa Kitzie discusses the concepts of online identity in the LGBTQ+ community. Using interviews with thirty LGBTQ+ individuals, she asks about the experiences of the LGBTQ+ community in both social media and search engines. She finds that search engines are often used by members of the community to find the language necessary to explore their identity, but this is problematic because labels are dependent on clicks rather than on identity. Some members of the community create false social profiles so that they can "escape the norms confining" their "physical body" and choose the identity they want others to see. Many use social media to connect to other members of the community. The suggestions of further people to follow often introduces the user to more terms that help them with their identity. Her work is an important exploration of the concept of self, both on and offline.

Other Selected Papers
Sarah Bratt presented "Big Data, Big Metadata, and Quantitative Study of Science: A Workflow Model for Big Scientometrics". In this paper, she and her co-authors demonstrates a repeatable workflow used to process bibliometric data for the GenBank project. She maps the workflow that they developed for this project to the standard areas detailed in Jeffrey M. Stanton's Data Science. It is their hope that the framework can be applied to other areas of big data analytics and they intend to pursue a workflow that will work in these areas. I wondered if their workflow would be applicable to projects like the Clickstream Map of Science. I was also happy to see that her group was trying to tackle disambiguation, something I've blogged about before.

Yu Chi presented "Understanding and Modeling Behavior Patterns in Cross-Device Web Search." She and her co-authors conducted a user study to explore the behaviors surrounding beginning a web search on one device and then continuing it on another compared with just searching on a single device. They make the point that "strategies found on the single device, single-session search might not be applicable to the cross-device search". Users switching devices have a new behavior, re-finding, that might be necessary due to the interruption. They discovered that there are differences in user behavior in the two instances and that Hidden Markov Models could be used to model and uncover some user behavior. This work has implications for search engines and information retrieval.

"Toward A Characterization of Digital Humanities Research Collections: A Contrastive Analysis of Technical Designs" is the work of Katrina Fenlon. She talks about thematic research collections, which are collected by scholars who are trying to "support research on a theme". She focuses on the technical designs of thematic research collections and explores how collections with different uses have different designs. In the paper, she reviews three very different collections and categorizes them based on need: providing advanced access to value-added sources, providing context and interrelationships to sources, and also providing a platform for "new kinds of analysis and interpretation". I was particularly interested in Dr. Felon's research because of my own work on collections.

I was glad to once again see Leslie Johnston from the United States National Archives and Records Administration. She presented her work on "ERA 2.0: The National Archives New Framework for Electronic Records Preservation." This paper discusses the issues of developing the second version of Electronic Records Archives (ERA), the system that receives and processes US government records from many agencies before permanently archiving them for posterity. It is complex because records consist not only of different file formats, but many have different regulations surrounding their handling. ERA 2.0 now uses an Agile software methodology for development as well as cloud computing in order to effectively adapt to changing needs and requirements.

Unique to my experience at the conference was Kolina Koltai's presentation of "Questioning Science with Science: The Evolution of the Vaccine Safety Movement." In this work, the authors interviewed those who sought more research on vaccine safety, often called "anti-vaxxers". Most participants cited concern for children, and not just their own, as one of their values. They often read scientific journals and are concerned about financial conflicts of interest between government agencies and the corporations that they regulate, especially in light of prior issues involving research into the safety of tobacco and sugar. The Deficit Model, the idea that the group just lacks sufficient information, does not exist for this group. They discovered that the Global Belief Model has not been effective in understanding members of this movement. It is the hope of the authors that this work will be helpful in developing campaigns and addressing concerns about vaccine safety. In a larger sense, it supports other work on "how people develop belief systems based upon their values" also providing information for those attempting to study fake news.

Manasa Rath presented "Identifying the Reasons Contributing to Question Deletion in Educational Q&A." They looked at "bad" questions asked on the Q&A site Brainly. I was particularly interested in this work because the authors identified what features of a question caused moderators to delete it and then discovered that a J48-Decision Tree classifier is best at predicting if a given question would be deleted.

"Tweets May Be Archived: Civic Engagement. Digital Preservation, and Obama White House Social Media Data" was presented by Adam Kriesberg. Using data from the Obama White House Social Media Archive stored at the Internet Archive the authors discussed the archiving -- not just web archiving -- of Barack Obama's social media content on Twitter, Vine, and Facebook. Problems exist on some platforms, such as Facebook, where data can be downloaded by users, but is not necessarily structured in a way useful to those outside of Facebook. Facebook data is only browseable by year and photographs included in the data store lack metadata. Obama changed Vine accounts during his presidency, making it difficult for archivists to determine if they have a complete collection from even a single social media platform. An archived Twitter account is temporal, meaning that counts for likes and retweets are only from a snapshot in time. On this note, Kriesberg says that values are likes and retweets are "incorrect", but I object to the terminology of "incorrect". Content drift is something I and others of WS-DL have studied and any observation from the web needs to be studied with the knowledge that it is a snapshot in time. He notes that even though we have Obama's content, we do not have the content of those he engaged with, making some conversations one-sided. He finally mentions that social media platforms provide a moving target for archivists and researchers, as APIs and HTML changes quickly, making tool development difficult. I recommend this work for anyone attempting to archive or work with social media archives.


As with other conferences, ASIS&T provided multiple opportunities to connect with researchers in the community. I appreciated the interesting conversations with Christina Pikas, Hamid Alhoori, and others during breaks. I also liked the lively conversations with Elaine Toms and Timothy Bowman. I want to thank Lorri Mon for inviting me to the Florida State University alumni lunch with Kathleen Burnett, Adam Worrall, Gary Burnett, Lynette Hammond Gerido, and others where we discussed each others' work as well as developments at FSU.

 I apologize to anyone else I have left off.


ASIS&T is a neat organization focusing on the intersections of information science and technology. As always, I am looking forward to possibly attending future conferences, like Vancouver in 2018.

-- Shawn M. Jones

Tuesday, October 24, 2017

2017-10-24: Grace Hopper Celebration of Women in Computing (GHC) 2017

This year’s Grace Hopper Celebration 2017 (@ghc, #GHC17), the world's largest gathering of women technologists, took place in Orlando Florida at the Orange County Convention Center on October 4-6. The events occurred in two locations: the Orange County Convention Center West (OCCC) and the Hyatt Regency (Hyatt), which is directly connected by skybridge.

GHC is presented by the Anita Borg Institute for Women and Technology, which was founded by Dr. Anita Borg and Dr. Telle Whitney in 1994. Grace Hopper attendees grew from 500 in 1994 to over 18,000 in 2017 with around 700 speakers.

This was my first time attending the conference and I was fortunate to receive an travel scholarship to attend GHC. The scholarships cover registration, travel, hotel, and meal expenses. Only 657 were selected from almost 15,000 applicants. In addition, three other graduate students from the Department of Computer Science at Old Dominion University attended the conference. Aida Ghazizadeh presented a poster, Maha Abdelaal got a Google scholarship, and Wessam Elhefnawy got a Hooper scholar. I was also lucky to have my sister Lamia Alkwai attending the conference who is a grad student at King Saud University, Riyadh, Saudi Arabia and also received an travel scholarship. Previously, Yasmin AlNoamany, a PhD graduate from @ODUCS @WebSciDL, attended GHC in 2015, 2014, and 2013.

Before attending the conference a phone application named GHC 17 was introduced that includes news feed, schedule, sponsors, top companies, resources and more. I really liked this application because it makes it easy to create your own selection of talks you want to attend, location, time and brief description of the talks. Also, another good resource is the GHC Scholars Facebook Group where questions are answered promptly and you can connect with other GHC attendees.

The conference schedule had inspiring talks and events which included keynote speakers, presentations, panels, workshops, a career fair, and a poster session. This year there were 20 different tracks which are: career, community, CRA-W, student opportunity, ACM research competition, general poster session, general session, lunches and receptions, special sessions, IOT/Wearable tech, products A to Z, artificial intelligence, computer system engineering, data science, human computer interaction, interactive/privacy, software engineering, open source, and organization transformation.

Tuesday, Oct 3
First Timers Orientation
On Tuesday, there was the first first timers orientation, where five speakers gathered to talk about the keynotes, featured speakers, types of sessions, students and faculty highlights, networking opportunities, planning your GHC survival skills, and staying engaged with The presenters in this session were Kathryn Magno, the Sr. Program Manager of GHC Operations at, Rhonda Leiva, the Senior Program Manager of Student Programs at, Beth Roberts, the Manager at GHC Content Operations, Stuti Badoni, the Director at GHC Content, and Radha Jhatakia, the Program Manager at GHC Content. Some of the tips mentioned were to not miss out on the keynotes, connect with speakers and students, check out the poster session, and attend the career fair. Other notes included wearing comfortable shoes, drinking plenty of water, and pacing yourself. They mention the “I AM” Movement! which is a sign that you can write on to define who you are in your own words. They concluded the talk with answering some of the audience questions.

Wednesday, Oct 4
Wednesdays Keynote
Wednesday’s keynote started with Aicha Evans, the Chief Strategy Officer at Intel Corporation and a Board Member at She welcome everyone and mentioned that today’s women technologists are shaping the future of technology. She introduced Dr. Fei-Fei Li, a Professor and Director at Stanford University AI Lab and the Chief Scientist at Google Cloud AI/ML. Dr. Fei-Fei Li talked about her experience as an immigrant when she moved to the United States at age 16, where she had to learn the language and understand a new culture. She also discussed the research she has been doing on Artificial Intelligence.

After that, Vicki Mealer-Burke introduced the winner of the Technical Leadership ABIE Award. This year’s winner was Diane Greene, the CEO of Google Cloud and a Board Member at Alphabet (ABIE Award Winner). People described her as a problem solver who understands technology and understands how to bring these technologies into the marketplace at a foundational level. She believes that one of the biggest contributions to her success was her father who gave her the confidence at an early age which structured her life journey. Her father would hand her the steering wheel and let her navigate the sailboat at age three. She also discussed the projects she worked on early in and her work and challenges when working on VMware.

Next, Megan Smith, a Former U.S. Chief Technology Officer, started her talk by showing her “I am Movement” sign - "I am so ready to upgrade tech culture with all of you". She talked about diversity and how to find jobs for people. She talked about the challenges of tech diversity and that it is time to make real progress in this matter. She showed that in 2016 only 21.7% of women in the US were employed in the technical workforce and in 2017 it had risen to 22.95%. This rate is considered low and we must work hard to make leadership increase in priority. In that matter four companies that made more progress are celebrated at GHC.
1- The first company that made the most progress with senior executives and technical executives and others is IBM.
2- The second company that got an award was Accenture, which was in the technical force of over 10,000 category.
3- The third company was GEICO, which was in the technical force of 1,000-10,000 category.
4- The fourth company was ThoughtWorks, which was in the technical force of under 1,000 category.

Next, Monique Chenier, the Director of Employment at Hackbright Academy, presented the ABIE award winner for social impact. This Social Impact ABIE Award Winner was presented to Dr. Sue Black OBE. Dr. Sue Black talked about the struggles she faced early in her life that led her to found #TechMums, which empowers mothers and their families through technology. TechMums is currently working with 500 mothers in the UK to give them skills, knowledge, and confidence to build successful lives. Dr. Black said, “If you help a million mothers, you are helping at least two million people because every single mother is a caretaker for at least one other person.” Dr. Black is also the founder of BCSWomen, the UK’s first online network for women in tech.

Next, Melinda Gates, the Co-Chair of Bill & Melinda Gates Foundation, talked about her early relationship with computers in her high school years. Afterwards, she talked about her projects in Microsoft and the big risks that you must take to move forward. Then, she spoke about the importance of diverse teams. She pointed out that in 2015 there were only 25% of women in the tech force and they held only 15% of technical roles. She also mentioned that tech is being part of our lives and it is going to grow and that AI should be taught the best of what humanity has to offer and that building AI should be for all genders and all ethnicities. She discussed the importance of making many pathways for people who have interest and talent in tech.

The next ABIE award was presented by Cindy Finkelman, the CIO FactSet Research Systems. The winner of the Students of Vision ABIE Award was Mehul Raje, a Master of Engineering degree program student at Harvard University. She talked about the steps she is taking to increase the number of women in computing.

Dr. Vicki Hanson, the President of the Association for Computing Machinery (ACM), presented Dr. Telle Whitney who co-founded GHC along with Dr. Anita Borg and served as president for for 15 years. She talked about the work that GHC does to create cultures where women thrive. She talked about the goals that she wishes that would be accomplished. Dr. Fran Berman, the Hamilton Distinguished Professor of Computer Science at the Rensselaer Polytechnic Institute and the Chair of Board of Trustees, presented Dr. Brenda Darden Wilkerson the President and the CEO of Dr. Wilkerson was excited to continue the journey that Dr. Telle and Dr. Anita started.

Wednesday Keynote Speakers GHC17: Dr. Fei-Fei Li, Professor and Director, Stanford University AI Lab; Chief Scientist, Google Cloud AI/ML, Diane Greene, ABIE Award Winner, Megan Smith, Former U.S. Chief Technology Officer, Dr. Sue Black OBE, ABIE Award Winner, Melinda Gates, Co-chair, Bill & Melinda Gates Foundation, Mehul Raje, ABIE Award Winner

Presentation Session: Artificial Intelligence Track
In the Artificial Intelligence track I attended the presentation: AI for Social Goods. In this session there were two talks. The first was the presentation on Can Machine Learning, IoT, Drones, and Networking Solve World Hunger? Presented by Jennifer Marsman, a Principal Software Development Engineer at Microsoft. The presenter talked about fighting world hunger using machine learning, drones, IoT, and networking research for precision agriculture. Recent studies show that food production must double by 2050 to meet demand from the world's growing population. In this work they use the idea of precision agriculture which is a farming management concept based on observing, measuring, and responding to inter and intra-field variability in crops. They try to use drones to help accomplish that with the use of placing sensors to measure the amount of water. The had several challenges such as sensor maintenance, power consumption, and where to place them. This approach was tested in 100 acres of farm in Carnation, Washington State and in a 2000 acres in upstate New York. The demo of this work can be found at Microsoft Cognitive Services.

The second presentation was Bias In Artificial Intelligence, Presented by Neelima Kumar, a Software Manager at Oracle. In this talk the presenter talked about the shortcomings of Artificial Intelligence (AI). She showed real life examples of IA that are racist or sexist. She first asked what we pictured in our mind if when the word nurse is used, is it a female nurse or a male nurse? Another example was the Google photo app that categorized a picture of a 22 year old and his friend that was categorized as gorillas. She discussed where Bias is introduced in AI. In AI the steps usually include training data that is collected and annotated, then the model is trained, and finally we get the output. The bias can be shown in every step. To solve the bias we need to build awareness of possible biases, digest for inclusion and diversity, and work with communities affected most. Finally the presentation was concluded by the phrase “AI will change the world but who will change AI”.

Career Fair
Career Fair was huge, it had most major of the companies such as Google, Microsoft, IBM, Facebook, Snapchat, and a lot more. Each company discussed the different opportunities they have for women. I would recommend having your CV printed out if you are looking for an internship or a job, as there are also on-the-spot interviews.

Poster Session
They also had two poster sessions, the ACM student research competition where Aida Ghazizadeh presented her poster, and the GHC general poster session. It was interesting to see all the different research people were doing. There was a poster session competition and the winners were announced at Friday's Keynote.

Thursday, Oct 5
Thursday Keynote
Thursday keynote started by Ana Pinczuk, the Senior Vice President Hewlett Packard Enterprise and a Board Member at, presenting three more ABIE award winners. She introduced Mary Spio, the Founder and CEO of CEEK VR, Mary started by showing her "I AM" Movement sign where she wrote "I am changing the face of innovation and I need your help". She mentioned her struggles during her beginnings. When she came to the US at age 16, she worked at McDonald's and when she got her first paycheck she thought it was more than she could have and asked her boss if it was a mistake, her boss thought she was complaining and he instantly offered a higher paying job. This was her first lesson to not be afraid to ask for more. After that, she joined the army, then got a scholarship to go to college. Next, her career bloomed and got some patents for technologies she developed. Later on she did some tours on behalf of the US to spread the goodwill of America. She then worked on CEEK VR the virtual reality eyewear experience. She concluded by discussing the importance of diversity and that it is a necessity for innovation.

To present the next ABIE award winner was presented by Astrid Atkinson, the Engineering Director at Google Product Infrastructure. She present the winner of Change Agent ABIE Award Winner Marie Claire Murekatete, a Software Manager at Rwanda Information Society Authority (RISA) and the Founder of Refugee Girls Need You. She discussed the challenge she faced such as not owning a computer at college level. "Not giving up and having endless curiosity, my wish is for all of you is to enjoy life the fullest and overcome any challenges that come your way, find a tribe of wonderful people who can help you say yes to big opportunity" says Marie Claire Murekatete on what she learned from her experience in life.

Cathy Scerbo, the VP of IT Business operation at the Liberty Mutual Insurance, recognized the next ABIE award winner. The next ABIE Leadership Award Winner, was awarded to Mercedes Soria, Vice President of Software Engineering at Knightscope, Inc. She discussed the challenges she faced coming from the Ekwador and not knowing how to speak English, but she overcame the challenges and always focused on her studies. She concluded her talk by showing her "I AM" Movement sign "I am a crime fighter and I develop technology that saves peoples lives".

Next, Debbie Sterling, the Founder and CEO of GoldieBlox. Debbie started by showing her "I AM" Movement which says, "I am a pick aisle disruptor". She talked about her journey in finding her passion. She studied mechanical engineering and product design at Stanford University. She had an idea with her friend to have building construction toys for girls. Bit by bit she started to come up with toys that girls will enjoy and let their imagination set off. She ended up creating the company GoldieBlox, which is based on a girl engineer who solves problems. The company was founded in 2012, and it was recognized as a leader in children’s entertainment and has reached billions of consumers.

After that, Kim Garner, the Vice President Advisory Services at Neustar, Inc, introduced the winner of the Technology Entrepreneurship ABIE Award Winner. The award was presented to Dr. Laura Mather, the CEO and Founder of Talent Sonar. Dr. Mather startup was with cybersecurity space such as Bank of America, Apple, etc. After that, she tried to use technology to address unconscious bias. She explained that the main solution to solve this bias is through changing the process of doing things such as hiring, mission making, and permissions.

Thursday Keynote Speakers GHC17:  Mary Spio, Founder and CEO, CEEK VR, Marie Claire Murkatete, ABIE Award Winner, Mercedes Soria, ABIE Award Winner, Debbie Sterling, Founder and CEO, GoldieBlox, Dr. Laura Mather, ABIE Award Winner

Featured Speaker: Yasmin Mustafa
In special session track I attended the presentation for the featured speaker Yasmin Mustafa, the Founder at ROAR, Coded by Kids, Temple University. She is a refugee of the Persian Gulf War. She talked about her journey in coming to the US and her struggles in life. After that, she became an American citizen, and then she started running a blog, built her audience, and fell in love with marketing. She then started the Philadelphia chapter of Girl Develop It, an organization that aims to get more women to learn programming in supportive environments. This enabled her to travel around South America for six months, there she encountered many women who had been attacked or harassed. This inspired her to create a self-defense wearable technology company aimed at diminishing attacks against women and addressing the underlying causes of violence. She created a company to create the wearable devices called ROAR for Good, and she became CEO and the Founder. The take-away of her talk was, “You have to be a little naïve and crazy to start a company, don’t listen to naysayers, and you can start a tech company as a non-techie”. You can find Yasmin TEDx Talk “The Birth Lottery Does Not Define You”.

Presentation Session: Artificial Intelligence Track
The next session I attended was in the AI track. The presentation's title was “Recommendation Systems”. The first talk was “Buy It Again: Repeat Purchase Recommendations for Consumables”, presented by Aditi Bhattacharyya, Technical Lead in the Amazon Personalization organization. She started her talk by asking the audience if they ever have used Amazon, and almost all of the audience had. She discussed that Amazon uses personalized recommendation for each customer. She gave an overview of types of recommendations that existed. The first was collaborative filtering by collecting information from other users who had previously bought some similar items and item-to-item collaborative filtering by recommending items that are in the same category of the customer’s purchasing history. They recommend consumables such as buying the same grocery items, house supplies, office supplies, and health care items, so the model has to predict when the user will need to buy it again. Next, she discussed the method to build the model which depends on the repeat purchase count; however, the challenge was if an item was irrelevant to the customer anymore such as buying a baby formula. Time decay model assigns weights to the purchases based on time where the weight will decay over time, this also could be a challenge because after a certain amount of time the items weight could be so low although it is relevant. These challenges lead to creating a consumption rate which is the rate of the repeat purchase and the timestamp of last and first purchase. To know the repeat purchase score of an item even if the user purchased an item for the first time but based on the item’s score from other customers, then the repeat purchase score of the item is based on the repeat purchase rate and the aggregated purchase signal of item across all customers. This means that both the repeat signal of the customer alone and all customers is considered in the recommendation. The threshold of creating a recommendation is based on the repeat customer count, total customer count, repeat customer probability, repeat purchasable if both repeat customer count and the repeat customer probability are higher than a certain threshold which is calculated during offline and online process, and finally rank descending order of repeat purchase score. To optimize the process to perform the calculations on all the items they divided the task to online and offline process. Offline process includes expensive computations, build process, and offline filtering. Online process includes real time service lookup, online filtering, and front-end rendering. Next, in the evaluation both offline and online metrics. Precision and recall are used for offline metrics and A/B testing framework by showing the current recommendations and the newly introduced recommendations and other metrics such as purchases, views, click rates, and engagement. They found that this method of recommendations has improved the purchase rate.

The second talk was “Learning to be Relevant, Course Recommender Systems for Online Education”, presented by Shivani Rao, LinkedIn. The goal of the talk is to learn about the algorithms behind course recommendations for the online education platform. In general, because skills are always coming and going and because the job market is dynamic, learning does not stop after school and there is a need of lifelong learning. To create member-to-course recommendations there is both offline and online processing. The Offline processes both the member data and the course data to member-to-course recommendation which is stored in a Key-Value-Voldemort for high-scalability storage. After that the online process starts with using the Restli Service to Front-End. Usually the offline process is used for email flows, and the online can be used for viewing online feeds and other online activities. For solving the cold start problem they used the information provided by the user in their LinkedIn profile. However, since only 30% of users fill in the profile, there is an option to use members’ inferred title and most distinct skills associated with it based on members’ cohort. Another challenge was that only 2% of the skills are covered by manual tagging. The solution was to learn a supervised model using the manual tags as labels. Selecting courses for members is done by scoring and then ranking the courses. An additional challenge that the system can have is scaling challenge because each model has recommendations for 500M+ members and 200+ courses. A solution is that a model is not served to all the members and that same model may be served in different channels with different treatments. The last topic discussed was Micro-content, and the challenges it has such as how to identify the videos that are useful, which is solved using key features. More information on this topic can be found here.

The final talk in this session was “Related Pins at Pinterest: Evolution of a Real-World Recommender System”, presented by Jenny Liu, Software Engineer at Pinterest. The talked started by an overview of Pinterest. Then she noted that in Pinterest 40% of the views and saves come from related pins recommendations. The main principles for building systems are start simple, keep it simple, and optimize for iterations. The major pieces of related pins were candidate generation, memorization, and learning to rank. Candidate generation could be from user boards and board co-occurrence. Since calculating the co-occurrence of every pin is computationally expensive, they started off with random sampling and using heuristics score which worked really well. For finding candidate recommendations for pins that are only connected to one board they find pins that could be more than one hop away. They use the Pixie system to place all the coins in the memory and perform random walk with over 100,000 steps. However, in this system new pins are not visited during random walks. To solve this problem, they added fresh candidates which are the recent added pins. The next major pieces of related pins was memorization. When checking what are people clicking, we need to know the reason of clicking - is it because of relevance or because of the position? To solve this they added normalize by position, where they move pins that are worth more to a non-clicked position. The final major pieces of related pins was learning to rank. Here in the feature extraction of the candidates, the feature vector is converted to numerical value and placed in a score model which returns a score by multiplying that to the feature weights. This method increased saving pins. However, this had some challenges such as that the linear model is not enough, there were feedback loops, and it was hard to experiment with models. Linear model was caused by having the feature weight the same for multiple candidates. This is solved by creating a gradient-boosted decision tree. The second challenge was having feedback loops which is having the same candidates showing over and over again. This is solved by having 1% of the traffic for low score candidates. The final challenge was hard to experiment with models offline because of the huge amount of data, the time it consumed, and personalization is not possible because it was offline results. The result was investing more on making it an online serving.

Friday, Oct 6
Presentation Session: Artificial Intelligence Track
I attended the presentation Real World Application of IA. The first presentation in this session was Bring Intelligence to Resource Utilization presented by Xiaoqian Liu, Data Engineer, Salesforce. Salesforce Einstein is a lawyer within the Salesforce platform that infuses artificial intelligence features and capabilities across all Salesforce clouds, it builds a specific model for each customer per app. Successful resource assignments will improve job efficiencies and costs. A case study is performed by applying RL on Spark job parameter tuning in real-time. The workflow includes job generation, model, and Hadoop server. The next step was the system implementation, then experiment phase. The results of this project shows a slight downtrend in the output graph results which indicates that feature improvements are needed.

The next presentation was Recommending Dream Jobs in a Biased Real World presented by Nadia Fawaz, Staff Software Engineer, LinkedIn. Recommendation systems trained on biased data may reflect the bias, this bias is crucial when it affects job finding. The technical scope challenges are in training, evaluating and deploying a large scale recommendation system based on biased real world data. This talk presented an overview of Jobs You May Be Interested In (JYMBII), which is a personalized list of jobs for the user. This list is generated by a machine learning model that has to be trained to predict a relevant score to measure how each job is relevant for a specific member on LinkedIn. The model is taught by feeding it data such as member information, job posting, and previous interactions between members and jobs such as clicks, saves, and dismisses. This process is performed both online and offline. Next, to answer where bias comes from. The bias actually comes from different stages, in the data such as gender gap in positions in hiring more women than men in educational roles and less in tech roles. There is bias in how data is collected such as pages that have low rank where the data is never collected from them, or the position bias if the user only clicked on the top jobs instead of looking at all the options. Another bias is model bias, where the model is too simple and has few parameters or insufficient features to fully represent the outcome. Reducing bias matters because people think it is unacceptable or are illegal. In business, members care about quality of recommendations, and removing the bias could cause a lift in the business. Finally, if bias is in the technical model, it can impact the performance metrics in both online and offline evaluations. Reducing the online bias in data can be done using two methods: fully random bucket for models that may not have been chosen and session-based top k randomization after ranking them which randomize the position of the first k set. This can be done using explore/exploit parsimonious randomization. Reducing the online bias in the training dataset by  augmenting the dataset by adding random negatives, inferred positives, and high quality manual tagging. This could be evaluated by replay method by taking random bucket dataset as input, reranks based algorithm to evaluate, and compute top k reward on matched input.

The final presentation in this session was Where DNA Meets AI - and How You Can Help! presented by Amanda Fernandez, an Assistant Professor in Practice, University of Texas San Antonio. This talk was an overview of general information on connecting DNA with machine learning and AI solutions. How to get started in this field is to first learn programming language, and familiarize yourself with research problems. Two recommended resources if you are bridging CS and biology: Deep Learning for Health Informatics, Ravi et al, 2016 and Deep Learning for Computational Biology, Angermueller et al, 2016. Some open source models that are helpful and recommended to use are Google’s PAIR Initiative, Kaggle, and the Big Data Genomics.

Social Celebrating Arab Women in Technical Roles
I then attended the Social Celebrating Arab Women In Technical Roles, sponsored by Visa. I met many Arab and non-Arab women in computing (ArabWIC) who came from all over the world. The talk was presented by Dr. Sana Odeh, a Clinical Professor at the Computer Science Department, New York University, and Dr. Kaoutar El Maghraoui a Research Scientist at IBM. They talked about the goal of the group and what our job is towards it. After that, we socialized with others and we exchanged our bios and discussed the different ways to contribute to this group.

Friday Keynote
Friday’s keynote was presented by Nora M. Denze the Board of Trustees of She presented Dr. Ayanna Howard, Professor and Linda J. and Mark C. Smith Endowed Chair in Bioengineering in the School of Electrical and Computer Engineering at the Georgia Institute of Technology. Dr. Howard gave on overview in robotics. Her recent research is in the area of pediatric therapy by using robots in the home for kids with special needs, which is important for people who could not afford regular therapy sessions. The lesson learned from this work is that humans trust robots and our intelligent machines are inheriting our human biases.

Next, Dr. Brenda Darden Wilkerson, the President and the CEO of, presented the winner of 2017 Lifetime Achievement Award Winner, for recognition of her dedication to women in tech, Dr. Telle Whitney To help honor Dr. Telle, Dr. Fran Berman, the Hamilton Distinguished Professor of Computer Science at the Rensselaer Polytechnic Institute and the Chair of Board of Trustees, talked about all the amazing things that Dr. Telle did in the 15 years she served Dr. Fran then presented the award to Dr. Telle. Which talked about how important it is for us to make a change and contribute to the movement of creating more opportunity for women.

Next, Dr. Jodi Tims, Chair ACM and Council on Women in Computing, presented the ACM Student Research Competition. GHC hosts one of the largest technical poster sessions in the U.S. This year was 90 posters total. The poster winners were announced.

The winners of the Undergraduate Student Research Competition:

The winners of the Graduate Student Research Competition:

1- Parishad Karimi, WINLAB, Rutgers University, The State of New Jersey University, the poster on "SMART: A Distributed Architecture for Dynamic Spectrum Management".
3- Mariam Nouh, University of Oxford, UK, the poster on "CCINT: Cyber-Crime INTelligence Framework for Detecting Online Radical Content".

Next, Dr. Jennifer Chayes, Technical Fellow and Managing Director at Microsoft presented the winner of Denice Denton Emerging Leader ABIE Award Winner Dr. Aysegul Gunduz. Dr. Gunduz develops tools and devices that identify neurology disorder.

After that, Dr. Deborah Berebichez, Chief Data Scientist, Metis; Co-host, “Outrageous Acts of Science”, talked about her life story and how important it is to keep going and face all obstacles that may come in life, and how import it is to encourage each other.

Next, Sherry Ryan, Vice President and Chief Information Security Officer, Juniper Networks, presented the winner of A. Richard Newton Educator ABIE Award Winner Dr. Marie desJardins, Professor of Computer Science and Electrical Engineering, University of Maryland. Dr. Marie desJardins has been detected to support women in computing. She has given them guidance, support, care, and feedback to female facility as they prepare for their tenure packages.

The next speaker was Maureen Fan, CEO and Co-founder, Baobab Studios. Maureen talked about her nation in virtual reality and animation, and how it inspires you to dream. She also talked about storytelling and its strength to make you care about the character. She also talked about how to create your own path and follow your dreams even if it is not an easy path to take.

Nora M. Denze Ended the Keynote by a quote by Grace Hopper herself, "Ships in the harbor are safe, but that’s not what ships are made for".

Friday Keynote Speakers GHC17: Dr. Ayanna Howard, Professor and Linda J. and Mark C. Smith Endowed Chair, Bioengineering, School of Electrical and Computer Engineering, Georgia Institute of Technology, Aysegul Gunduz, ABIE Award Winner, Dr. Deborah Berebichez, Chief Data Scientist, Metis; Co-host, “Outrageous Acts of Science”, Dr. Marie desJardins, ABIE Award Winner, Maureen Fan, CEO and Co-founder, Baobab Studios

Friday Night Celebration
During the Friday Night Celebration, we all went to dance, socialize, get swag from Google, eat food, and celebrate Grace Hopper. I met incredible people and had enjoyed my time.

My Recommendations and Final Thoughts
My recommendations for future attendees is to plan the talks you’re going to attend early on before the conference, and make sure you know where the talks are as people line up almost 15 minutes before the talk and spaces fill up fast. It is important to have a plan B talk. Also, some popular talks are repeated during the day, so if you could not attend the first session you can attend the repeat. In addition, do not miss the opportunity to socialize with other attendees or speakers. Also, if you are looking for a job or internship opportunity upload your CV to the GHC database, and have a print out of your CV for the career fair. Finally, you will receive a lot of swag during the conference, so prepare to make space in your bag.

It was amazing to see all the women in computing, it made me feel I am not alone in this field. I would like to thank Grace Hopper for this amazing conference. I enjoyed every bit from meeting amazing women of computing to listening to the inspirational talks, dancing, enjoying Orlando, and seeing my sister who came all the way from Saudi Arabia.

--Lulwah Alkwai