Thursday, May 30, 2013

2013-05-30: World Wide Web Conference WWW2013 in Rio de Janeiro, Brazil, Trip Report

After a long overnight flight, I landed in the sunny and beautiful Rio de Janeiro. A couple of months earlier, my paper entitled: “Carbon Dating the Web: Estimating the Age of Web Resources” was accepted at the third annual Temporal Web Analytics Workshop TempWeb03 which is associated with the 22nd World Wide Web conference WWW2013. My colleague Ahmed Al Sum’s paper got accepted as well entitled: “Archival HTTP Redirection Retrieval Policies”. Ahmed wrote a beautiful detailed post about the workshop which I encourage everyone to read.

I arrived on Monday the 13th morning at 6 AM and immediately took a taxi to the Windsor Barra hotel where the conference is held and where I will be residing for the next 5 days. My colleague Ahmed arrived a day earlier so he was bragging that he got the chance to relax and see the sunset on the beautiful beach. After a quick shower I went downstairs to the registration area to receive my ID tag and the conference kit. Everything went completely smooth and the volunteers were extremely helpful. To put the reader in perspective, the main language spoken in Brazil is Portuguese and a minute percentage of the population know English. Prior to my trip, I taught myself the regular greetings in Portuguese and thought with my knowledge in Spanish, English, French, and Arabic I might get by, but unfortunately I was wrong. I had to sign-language  my way out of the airport! Brazilian people are very nice and hospitable but the language stood as a communication barrier. After talking with the young volunteers a little, I salute the organizing committee for this idea, the volunteers were all fluently bilingual, knowledgeable about the city, and are college students who are mostly in the same area of computer science or engineering.

Despite my exhaustion, I rushed to attend the tutorial entitled “Measuring User Engagement” presented by Mounia Lalmas, Heather O’Brien and Elad Yom Tov who couldn’t make it. I met Dr. Lalmas last October at TPDL 2012 in Cyprus. And as my work focuses on several aspects of user analysis on the web I was personally interested in her work and the tutorial was definitely worth skipping the nap time. They started by explaining what is user engagement, its importance and how to measure it. After introducing the concept to the audience they elaborated in the first part of the talk the basic foundations of user engagement. Forrester’s 4 Is, the theory of flow and how it is related to user engagement. After that how to measure user engagement either by self reporting, cognitive engagement via physiological measures, or by interaction engagement i.e., web analytics. They explained every branch in detail along with explaining their approaches and experiments conducted. And finally wrapped up with the second part of the tutorial by explaining the advanced aspects through mobile user engagement and information seeking. Finally the session ended with a Q and A period with the attendees. The beauty of this tutorial lies in the way Mounia and Heather presented everything, through a natural sequence they took a regular attendee who knows nothing about user engagement or any user studies per say and explain all the details and ended by the state of the art experiments in the field. In conclusion, the talk was informative and engaging, definitely worth attending.

After lunch, Ahmed and I headed to the Temporal Web 2013 workshop which was one of the 10 concurrent workshops at WWW. Dr. Marc Spaniol was the chair of the workshop and he introduced the keynote speaker Dr. Omar Alonso. My colleague Ahmed wrote a detailed report about the workshop which I encourage you to read. I was fortunate enough to be invited by Dr. Spaniol to chair one of the three sessions of the workshop which was an honor and a delight. After the workshop ended we all headed out to have a brazilian dinner at a nearby restaurant and bar.

Next morning we started early and after breakfast we headed to the second day of track tutorials. Throughout the conference since there are about 7-10 tracks running simultaneously it was very hard to pick the sessions I wish to attend as several of them were really interesting and related to my work. I attended the (Big) Usage Data in Web Search talk by Ricardo Baeza-Yates, Yoelle Maarek and was presented by the latter. Dr. Maarek described how much queries differ from documents and that the overlap between queries and the documents is very small. She elaborated the stages of the query flow-graph via an example about Barcelona. The stages were in correcting (not barelona but barcelona), specializing (F.C. Barcelona not just barcelona), generalizing (barcelona cheap hotels), and parallel move (F.C. Barcelona. and Real Madrid). She explained the reasons behind separating query session by task not time (going to rio, where to stay and what to eat and visit?) and call it research session.

After lunch we head to attend the second set of workshops. I picked the 2nd International Workshop on Real-Time Analysis and Mining of Social Streams (RAMSS). The keynote speech was presented by Ramesh Sarukkai from Google presented ‘Real-time User Modeling and Prediction: Examples from YouTube’. After a group of fascinating presentations Dr. Gianmarco De Francisci Morales from Yahoo research Barcelona gave an awesome ending keynote speech presenting SAMOA which is a platform that mines big streaming data using map reduce on hadoop.

Next morning was the first day of the conference, and since it didn’t start till 1pm Ahmed and I decided to go to the beach. I picked Copacabana beach so we spent the morning there. After we returned back to the hotel we had lunch and headed to the opening session of the conference. After a few words from a distinguished panel of Brazilian technology leaders and the head of W3C, and a few minor hiccups with the translation headsets, Dr. Louis Von Ahn was introduced to start his keynote speech. It is safe to say that personally, and after all these years attending talks, his speech was the best I have ever attended. Looking around me and checking the twitter #www2013 feed I can see the audience sharing my enthusiasm and focussing on every word he said. Dr. Von Ahn talked about reCaptcha and utilizing human computing in everyday authentication in digitizing and transcribing books. 1.1 billion users helped digitizing books using recaptcha to date resulting in 2 million digitized book annually, which I found fascinating.

Demonstrating the power of the people in human computation, and to help individuals expand their horizons in learning a new language for free in an effective way, he introduced a free language education tool for the world. Learning a new language can be costly and not available for the people in moderate to low income areas, so the motive was to provide a tool that helps an individual to learn a new language via the computer/phone. Rosetta stone have been doing that for years, Dr. Von Ahn argued, but it cost about a 1000 USD. The motive was to provide a similar tool, if not more exciting and easier, free of charge. The game changer was to find a way to fund this project but without burdening the users with subscriptions. Following the same paradigm of reCaptcha by utilizing the computational power of the people, Dr. Von Ahn analyzed the possibility of using the learning/testing phases of the collective users in translating content from one language to another and using selling these translations to fund DuoLingo. He argued that if 1 million native Spanish speakers were starting to learn English, they can translate the entire english content of wikipedia in less than 80 hours. Also after several months of studying the learning curves of people and their skill levels enhancements per language the Duolingo team was able to enhance the learning steps for each language. Also they reached several interesting results for example, that italian women learn english faster than italian men by 10% and that 34 hours on Duolinguo is equivalent to a semester of language learning. After the fascinating keynote speech, Dr. Von Ahn met the entrepreneurs in a meet and greet session which he started by saying: “I am not gonna charge the users”.

Shortly after, the attendees dispersed to attend the sessions that best suit their interests. I have never wanted to exist in two places at once like I did in the following three days of the conference. Interesting work, fascinating findings, and exciting topics, opening my research-eyes to new horizons. Since my interests would not match some of the readers’ matches I encourage you to explore the proceedings. In the next few paragraphs, I will talk about highlights of the session I attended in the next three days.

From the Social Web Engineering research track I attended “Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What To Do”. The authors discussed that the pulling methodology for workers, performing Human Intelligence Tasks or HITs on Amazon’s Mechanical Turk, is sub optimal. They argued that worker recommendation based on their social profiles is a better way to perform task-to-worker matching. By building an inverted index of the workers through a facebook app called Open Turk they reached a set of very interesting results. The next paper was entitled “Groundhog Day: Near-Duplicate Detection on Twitter”. As the title shows, the aim of this study was to extract the near duplicate tweets and to classify them as exact copies, nearly exact copies, strong-near duplicate, weak near-duplicate, and finally low overlap. They manually labelled nearly 55,000 tweets using dbpedia and wordnet. For the next presentation, I had to be fast in migrating to another research track: “Trust and Enterprise Social Networks” to attend a presentation for a paper entitled: “Mining Expertise and Interests from Social Media” by researchers from IBM research. After that I attended the last presentation in another track of “Privacy and Personalization” entitled “I Know the Shortened URLs You Clicked on Twitter: Inference Attack using Public Click Analytics and Twitter Metadata” where the authors argue to be the first to perform a click history study on Twitter.

Next we went to the posters room to have a coffee break for half an hour. It was definitely educational, the amount of discussions and ideas I was exposed to, talking to the researchers and poster-authors. The following session started at 5pm where I attended the “Negative Links and Anomalies in OSN” track. The first two papers: “What Is the Added Value of Negative Links in Online Social Networks?” and “Predicting Positive and Negative Links in Signed Social Networks by Transfer Learning” were very interesting and informative. The third paper I personally found fascinating which was entitled: “CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks”. Alex Beutel from CMU presented his work during his internship at Facebook by analyzing page likes and spotting spammers and fake accounts. For the last paper in the session, I hopped to the neighboring room and attend the last session in the “Transforming UIs/Personal & Mature Data” track. The paper entitled “Rethinking the Web as a Personal Archive” was presented by Siân Lindley as a joint collaboration with Cathy Marshall from Microsoft Research.

Ahmed and I did not attend the last session as we, and other students, were invited to attend a meet and greet session with the one and only Sir Tim Berners Lee where we got the opportunity to ask him several questions about science, research and industry. Answering Ahmed, he explained the Memento project and talked about web preservation with the audience. After the session, Ahmed and I went to have a walk in a neighboring area and had dinner in a small hole-in-the-wall place which was delightful.

The next morning I decided it was going to be pure relaxation as I was exhausted already. So I spent the morning on the beach opposing to the hotel. After lunch we attended the second keynote speech by Dr. Miguel Nicolelis the Duke School of Medicine Professor of Neuroscience at Duke University. The speech was about the brain to computer interface and the experiments they performed in this field which was both refreshing and fascinating. He ended his speech with an initiative his lab is working in the walk-again project that aims to make a paraplegic person walk and give initial kick at 2014 World Cup in Brazil which I found mind blowing. Following the keynote speech a panel was held entitled: “Net Neutrality and Internet Freedom”.

At 5 we started the first sessions and I attended a multiple of presentations in different tracks. I started with “Wisdom in the Social Crowd: an Analysis of Quora” from the “OSN Analysis and Characterization” track then attended the “User Behavior” track and finally the user and behavior modelling track to attend the presentation “Towards a Robust Modeling of Temporal Interest Change Patterns for Behavioral Targeting”. After a coffee break and more of the poster session, the second session started where I attended the “Mining Collective Intelligence in Groups” presentation in the “Web Mining” track then the last three presentations from the “Recommender Systems” track.

After the sessions, the organizers led us to the busses taking us to the Typical Brazilian gala Dinner. After approximately an hour drive in beautiful Rio we arrived to the restaurant Porcão. On the melodies of sweet samba we were greeted to the open area in the restaurant where a traditional band was playing. After all the buses arrived we were led to the dining room where I tasted some of the most amazing beef I have ever had (Brazilian Picanha). The dinner ended with a reenactment of the infamous carnival dances but in a smaller scale. Finally an amazing band of 18 young performers made the audience dance all night on the songs of The Beatles in a samba infusion that was so captivating.

Next day was the last day of the conference. After few announcements and the closing ceremony, Dr. Jon Kleinberg gave the closing keynote speech. His work with memorable quotes and the experiments he conducted to identify them from movies was quite fascinating and how the probability of adopting behavior depends on number of network neighborhoods that are adopting that behavior. After the keynote speech Ahmed and I attended the developers track to attend the session of “ResourceSync: Leveraging Sitemaps for Resource Synchronization” which is a joint work between Cornell, Michigan, Los Alamos National Lab and our one and only Old Dominion University. After a short coffee break I attended Cathy Marshall’s presentation on “Saving, Reusing, and Remixing Web Video: Using Attitudes and Practices to Reveal Social Norms”. After this presentation, with our bags packed, Ahmed and I left the hotel so we can race through the traffic to catch our flights back home.

A rather funny but unfortunate thing happened at the Miami airport border control. They kept me waiting for 4.5 hours while they process my papers. Well, I was watching a special about Michael Jordan on ESPN so I can’t complain much.

Overall, it was a very successful conference and trip. We got to present our work, represent the research group and Old Dominion University, attended several enlightening sessions, made great contacts, and exchanged a lot of ideas.

For more coverage please check out:
-- Hany M. SalahEldeen

No comments:

Post a Comment