2016-04-24: WWW 2016 Trip Report
SAVE-SD 2016
I began the conference at the SAVE-SD workshop, focusing on the semantics, analytics, and visualization of scholarly data. They had 6 full research papers, 2 position papers, and 2 poster papers. The acceptance rate for this conference is relatively high. The conference was kicked off by Alejandra Gonzales-Beltran and Francesco Osborne. They encouraged the use of Research Articles in Simplified HTML.
Alex Wade gave us an introduction to the Microsoft Academic Service (MAS) and a sneak peek at the new features offered by Microsoft Academic, such as the Microsoft Academic Graph. They are in the process of adding semantic, rather than keyword search with the intention of understanding academic user intent when searching for papers. They have opened up their dataset to the community and provide APIs for future community research projects.

On the right below, Kata Gábor demonstrated "A Typology of Semantic Relations Dedicated to Scientific Literature Analysis". Her poster shows a model for extracting facts about the state of the art for a particular research field using semantic relations derived from pattern mining and natural language processing techniques.

In closing, the SAVE-SD 2016 workshop mentioned that selected papers could be resubmitted to PeeRJ.
TempWeb 2016
The morning opened with a Keynote by Wolfgang Nejdl of the Alexandria Project. Wolfgang Nejdl discussed the work at L3S and how they were trying to consider all aspects of the web, from the technical to its effects on community and society. He discussed how social media has become a powerful force, but tweets and posts link to items that can disappear, losing the context of the original post. This reminded me of some other work I had seen in the past. He mentioned how important it was to archive these items.
He then went on to cover other aspects of searching the archived web, detailing challenges encountered by project BUDDAH, including the problem of ranking temporal search results. Seen below, he demonstrates an alternative way of visualizing temporal search results using the HistDiv project. This visualization for understanding the changing nature of a topic. In this case, we see how searching for the term Rudolph Giuliani changes with time, as the person's career (and career aspirations) change so do the content of the archived pages about them. He closed by discussing the use of curated archiving collections in Archive-It in the collaborative search and sharing platform ArchiveWeb, which allows one to find archive collections pertinent to their search query.
Both researchers used the following lunch to discuss temporal graphs at length. I wondered if one could model TimeMaps in this way and use these tools to discover interesting connections between archived web pages.
Sir Tim Berners-Lee
Sir Tim Berners-Lee spoke of the importance of decentralizing the web, ensuring that users own their own data, web security, work to standardize and improve the ease of payments on the web, and finally the Internet of Things (IoT).
Mentioning the efforts of projects like Solid, he highlighted the need to ensure that users retain their data to ensure their privacy. The idea is that a user can tell the service where to store their data and then they have ownership and responsibility over that data.
He mentioned that, in the past the Internet had to be deployed by sending tapes through the mail, but now we are heading to a point where the web platform, because it allows you deploy a full computing platform very very quickly, may become the rollout platform for the future. Because of this ability, security is becoming more and more important and he wants to focus on a standard for security that uses the browser, rather than external systems, as the central point for asking a user for their credentials, thereby helping guard against trojans and malicious web sites. He said that the move from HTTP to HTTPS has been less easy than expected, considering many HTTPS pages are "mixed" containing references to HTTP URIs. This results in three different worlds: those that are HTTP pages, those that are HTTPS pages, and upgrade insecure requests which still provide a mixed page, but one that is endorsed by the author.
Next, he spoke about making web payments standardized, comparing it to authentication. There are a wide variety of different solutions for web payments and there needs to be a standard interface. There is also an increasing call to allow customers to pay smaller amounts than before, which many current systems do not handle. Of course, customers will need to know when they are being phished, hence the security implications of a standardized system.
Finally, he covered the Internet of Things (IoT), indicating there are connections to data ownership, privacy, and security.
In the following Q&A session, I asked Sir Tim Berners-Lee about the steps toward browser adoption for technologies such as Memento. He said the first step is to discuss them at conferences like WWW, then engage in working groups, workshops, and other venues. He noted that one also needs to define the users for such new technologies so they can help with the engagement.
Later, during the student Q&A session the following day, Morgannis Graham from McGill University asked Sir Tim Berners-Lee about his thoughts on the role of web archives. He replied that "personally, I am a pack rat and am always concerned about losing things". He highlighted that while the general web users are thinking of the present, it is the role of libraries and universities to think about the future, hence their role in archiving the web. He stated that universities and libraries should work more closely together in archiving the web so that if one university falls, others exist having the archives of the one that was lost. He also stated that we all have a role in ensuring that legislation exists to protect archiving efforts. Finally, he tied his answer back to one of his current projects: what happens to your data when the site you have given it to goes out of business.
Lady Martha Lane-Fox
Wednesday evening ended with an inspiring talk from Lady Martha Lane-Fox. She works for the UK in a variety of roles advancing the use of technology in society. She states that a country that can: (1) improve gender balance in tech, (2) improve the technical skills of the populace, and (3) improve the ability to use tech in the public sector, will be the most competitive.
She went further in explaining how the current gender balance is very depressing, noting that in spite of the freedom offered by technology, old hierarchies and structures have been re-established. She indicated that there are studies showing that companies with more diverse boards are more successful, and how we need to tackle this problem, not only from a technical, but also a social perspective.
She discussed the challenges of bringing technology to everyday lives and applauded South Korea's success while highlighting the challenges still present in the UK. She relayed stories of encounters with the citizenry, some of whom were reluctant to embrace the web, but after doing so felt they had more freedom and capability in their lives than ever before. She praised the UK for putting coding on the school curriculum and looking toward the needs of future generations.
She then talked about re-imagining public services entirely through the use of technology. The idea is to make government agencies digital by default in an effort to save money and provide more capability. She highlighted a project where a UK hospital once had 700 administrators and 17 nurses, and, through adopting technology, were able to then take the same money and hire 700 nurses to work with 17 administrators, thus providing better service to patients.
She closed by discussing her program DotEveryone, which is a new organization promoting the promise of the Internet in the UK for everyone and by everyone. Her goal is for the UK to be the most connected, most digitally literate, and most gender equivalent nation on earth. In a larger sense, she wants to kick off a race among countries to use technology to create the best countries for their citizens.
Mary Ellen Zurko
Wednesday morning started with a keynote by Mary Ellen Zurko, from Cisco. She discussed security on the web. Her first lesson: "The future will be different; so will the attacks and attackers, but only if you are wildly successful". Her point was the the success of the web has made it a target. She then covered the history of basic authentication, S-HTTP, and finally SSL/TLS in HTTPS.
She then discuss the social side of security, indicating that users are often confused about how to respond to web browser warnings about security. There is a 90% ignore rate on such warnings, and 60% of those are related to certificates. She highlighted how difficult it is for users to know whether or not a domain is legitimate and if the certificate shown is valid. She also highlighted where most users, even expert users, do not fully understand the permissions they are granting when asked due to the cryptic and sometimes misleading descriptions given to them, mentioning that 17% of Android users actually pay attention to permissions during installation and only 3% are able to answer questions on what the security permissions mean.
Reiterating the results of a study by Google, she stated that 70% of users clicked through malware warnings in Chrome, but Firefox had more participation. The Google study found that the Firefox warnings provided a better user experience, and thus users were more apt to pay attention and understand them. Following this study, Google changed its warnings in Chrome.
She said that the open web is an equal opportunity environment for both attackers and defenders, detailing how fraudulent tech support scans are quite lucrative. This was discovered in recent work by Cisco, "Reverse Social Engineering Social Tech Support Scammers", where Cisco engineers actively bluffed tech support scammers in order to gather information on their whereabouts and identities.
Of note, she also mentioned that there is a largely unexploited partnership between web science and security.
Peter Norvig
On Friday morning, Peter Norvig gave an engaging speech on the state of the Semantic Web. He mentioned that his job is to bring information retrieval and distributed systems together. He went through a history of information retrieval, discussing WAIS and the World Wide Web, as well as ARCHIE. Before Google, several were trying to tame the nascent web at the time.
After Google, the Semantic Web was developed as a way to extract information from the many pages that existed. He talked about how Tim Berners-Lee was a proponent, whereas Cory Doctorow highlighted that there were noting but obstacles in its path. Peter said that Cory had several reasons for why it would fail, but the main were (1) people lie, (2) people are lazy, and (3) people are stupid, indicating that the information gathered from such a system would consist of intentional misinformation, lack of complete information, or misinformation due to incompetence.
Peter then highlighted several instances where this came about. Initially, excellent expressiveness was produced by highly trained logicians, giving us DAML, OWL, RDFa, FOAF, etc. Unfortunately, they found a 40% page error rate in practice, indicating that Cory was correct on all 3 fronts. Peter's conclusion was the highly trained logicians did not seem to solve the identified problems.
Peter then posited "what about a highly trained webmaster?". In 2010, search companies promoted the creation of schema.org with the idea of keeping it simple. The search engines promised that if a site were marked up, then they would show it immediately in search results. This gave users an incentive to mark up their pages and now has resulted in technologies that can better present things like hotel reservations and product information. This led most to conclude that schema.org was an unexpected success.
Peter closed by saying that obstacles still remain, seeing as most of the data comes from web site owners, still leading to misinformation in some cases. He talked about the need to be able to connect different sources together so that one can, for example, not only find a book on Amazon, but also a listing of the Author's interests on Facebook. He hopes that neural networks could be combined with semantic and syntactic approaches to solve some these large connection problems.
W3C Track

Poster Session
Of course, I was here to present a poster, "Persistent URIs Must Be Used to be Persistent", developed by Herbert Van de Sompel, Martin Klein, and I, which indicates important consequences for the use of persistent URIs such as DOIs.
Thanks to everyone at #www2016 who came by to discuss and experience our poster: https://t.co/XkFueTMqZZ pic.twitter.com/59LgAHSdZB— Shawn M. Jones (@shawnmjones) April 13, 2016
In looking at the data from "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot", we reviewed 1.6 million web references from 1.8 million articles and discovered 3 things:
- use of web references is increasing in scholarly articles
- frequently authors use publisher web pages (locating URI) rather than DOIs (persistent URI) when creating references
I appreciate the visit from Sarven Capadisli and Amy Guy who work on Solid. Many others came by to see our work, like Takeru Yokoi, Hideaki Takeda, Lee Giles, and Pieter Colpaert. Most appreciated the idea, noting it as "simple" with some asking "why don't we have this already?".
WWW Conference Presentations
Even though I attended many additional presentations, I will only detail a few of interest.

Of course, I did not merely enjoy the presentations and posters. Among the Monday night SAVE-SD dinner, the Thursday night Gala, and lunch each day, I took the opportunity to acquaint myself with many field experts. Google, Yahoo!, and Microsoft were also there looking to discuss data sharing, collaboration, and employment opportunities.
I always had lunch company thanks to the efforts of Erik Wilde, Michael Nolting, Roland Gülle, Eike Von Seggern, Francesco Osborne, Bahar Sateli, Angelo Salatino, Marc Spaniol, Jannik Strötgen, Erdal Kuzey, Matthias Steinbauer, Julia Stoyanovich, Jan Jones, and more.
Furthermore, the Gala introduced me to other attendees, like Chris LaRoche, Marc-Olivier Lamothe, Ashutosh Dhekne, Mensah Alkebu-Lan, Salman Hooshmand, Li'ang Yin, Alex Jeongwoo Oh, Graham Klyne, and Lukas Eberhard. Takeru Yokoi introduced me to Keiko Yokoi from the University of Tokyo who was familiar with many aspects of digital libraries and quite interested in Memento. I also had a fascinating discussion about Memento and the Semantic Web with Michel Gagnon and Ian Horricks, who suggested I read "Introduction to Description Logic" to understand more of the concepts behind the semantic web and artificial intelligence.
In Conclusion
As my first academic conference, the WWW 2016 conference was an excellent experience, bringing me in touch with paragons on the forefront of web research. I now have a much better understanding of where we are in the many aspects of the web and scholarly communications.
Even as we left the conference and said our goodbyes, I knew that many of us had been encouraged to create a more open, secure, available, and decentralized web.
Au revoir Montréal pic.twitter.com/nSIpC2UvUT— Shawn M. Jones (@shawnmjones) April 16, 2016
Post a Comment