Monday, September 3, 2018

2018-09-03: Trip Report for useR! 2018 Conference




This year I was really lucky to get my abstract and poster accepted for useR! 2018 conference. The UseR! conference is an annual worldwide conference for international R users and developer community. The fourteenth annual conference was held in the Southern hemisphere in Brisbane, Australia from July 10-13, 2018. This four-day conference consists of nineteen 3-hour tutorials, seven keynote speeches, and more than 200 contributed talks, lightning talks and posters on using, extending, and deploying R. This year, the program successfully gathered almost 600 users of the data analysis language R, from all corners of the world from various expertise levels of R.

Distribution map of useR! 2018 participants across the globe


Fortunately, I was also granted a travel scholarship from the useR! 2018 and could attend the conference including the tutorial sessions for free (thanks useR! 2018).

Day 1 (July 10, 2018): Registration and Tutorial

The conference was held at Brisbane Convention and Exhibition Centre (BCEC). Each participant must register themselves at the secretariat desk and received a goodie bag containing a t-shirt, a pair of socks, and a lanyard (if lucky). The name tags can be picked from a board which are ordered by last name.

The Secretariat Desk


T-shirt and name tag from useR! 2018


useR! 2018 is identified with hexagonal shapes which can be found everywhere in useR! 2018: the name tags, the hex stickers, and of course, the amazing Hexwall designed by Mitchell O'Hara-Wild. He also wrote a blog post about how he created the hexwall. There was also a hexwall photo contest where all conference attendees are requested to take a picture with the hexwall and post it on twitter with hashtag #hexwall.

Me and the hexwall


The R tutorials are conducted in parallel sessions from Tuesday to Wednesday morning (July 10 - 11, 2018). Each participant can only participate in a maximum of three tutorials. The first tutorial that I attend is Wrangling Data in the Tidyverse by Simon Jackson.

This is my first time using Tidyverse, and I found it really helpful for data transformation and visualization, once I got familiar with it. Using the data example from booking.com, we got hands-on experience with various data wrangling techniques such as handle missing values, reshaping, filtering, and selecting data. The thing that I love the most about Tidyverse is the dplyr package. It comes with a very interesting feature pipe (%>%) which allows us to chain together many operations.

In the second tutorial by Statistical Models for Sport in R by Stephanie Kovalchik, we learned how to use R to implement statistical models that are common in sports statistics. The tutorial consists of three parts:
  1. web scraping to gather and clean public sports data using RSelenium and rvest
  2. explore data with graphics
  3. implementing three models: Bradley-Terry paired comparison models, Pythagorean Theorem, Generalized Additive Models, and Forecasting with Bayes.
During the tutorials session, I met three other Indonesians who are currently studying in Australia as Ph.D. students (small world!).
Indonesian students at useR! 2018

Day 2 (July 11, 2018): Tutorial, Opening Ceremony, and Poster Presentation. 

Tutorial

The morning session is filled with tutorial activities which are a continuation of the series of tutorials that began the day before. I attended the tutorial Follow Me: Introduction to social media analysis in R by Maria Prokofieva, Saskia Freytag, and Anna Quaglieri.

Dr. Maria Prokofieva talked about social media analytics using R

During this 2.5 hour tutorial, we learned how to use R libraries twitteR and rtweet for extracting data from twitter and then convert the tweet in the text column to token using tidytext. In general, the whole process is a bit similar to what I have learned in Web Science class by Dr. Michael Nelson at Old Dominion University (ODU), except that all of the processes are conducted in R instead of Python. At the end of the session, we were given a challenge to compare tweets which mention Harry to tweets mentioning Meghan in the royal wedding time series. The answer should be uploaded to twitter using the hashtags #useR2018, #rstats and #socialmediachallenge. All tutorial materials are available on R-Ladies Melbourne's GitHub.

R-Ladies Gathering

There was an R-Ladies gathering that took place during the lunch after the tutorial session. It was such an excellent opportunity to meet other amazing R-Ladies members who have done various project and research in R and get their R libraries published on CRAN. It was really inspiring to hear their stories of promoting gender diversity in the R community. There are 75 R-Ladies groups spread across the globe. Unfortunately, there is no R-Ladies group in Indonesia at this moment. Maybe, I should start creating one?
With Jenny Bryan during the R-Ladies meeting

Opening Ceremony, Keynote Speeches, and Poster Lightning Talk

At 1.30 pm, all conference attendees gathered in the auditorium for the Opening Ceremony. The event started with a performance by Songwoman Maroochy Welcome to the Country followed by an opening speech delivered by useR! 2018 chief organizer, Professor Di Cook from the Department of Econometrics and Business Statistics at Monash University. In her remarks, Professor Cook encourages all attendees to enjoy the meeting, learn as much as we can, and be cognizant of ensuring others have a good experience, too.

Opening speech by Professor Di Cook

By the way, for those who are curious, here's a sneak peek of the Songwoman Maroochy performance.



Next, we had a keynote speech by Steph de Silva Beyond Syntax, Towards Culture: The Potentiality of Deep Open Source Communities.

After the keynote speech, there was a poster lightning talk session where every presenter is given a chance to advertise and let everyone know what the work is about and encourage them to come and see it during the poster session.
My poster lightning talk
Before ending the opening ceremony, there was another keynote speech by Kelly O'Briant of RStudio titled RStudio 2018 - Who We are and What We Do.

Poster Session.

The poster session wrapped up the day. I am so grateful that useR! 2018 uses all-electronic posters. So, we did not have to bother ourself printing a large poster and carried it across the globe all the way to Australia. There are two poster sessions, one on Wednesday evening and another one during lunch on Thursday. For poster presentation, the conference committee provides 20 47-inch TVs that have HDMI connections to connect the TV to our laptop. This way, if someone asked, we can directly do a demo or showing a specific part of our code on the TV as well.

In this conference, I presented a poster titled AnalevR: An Interactive R-Based Analysis Environment for Utilizing BPS-Statistics Indonesia Data. This project idea originated from the challenge we faced at BPS-Statistics Indonesia. BPS produces a massive amount of strategic data every year. However, these data are still underutilized by public users because of several issues such as bureaucratic procedure, the money that they have to pay, and long waiting time to get their requested data processed. That’s why we introduce AnalevR, an online R-based analysis environment that allows anyone anywhere to access bps data and perform analyses by typing R codes on a notebook-like interface and get the output immediately. This project is still a prototype and currently in the development stage. The poster and the code are available on my GitHub.
Me during the poster session

Day 3 (July 12, 2018): Keynote Speech, Talk, Poster Presentation, and Conference Dinner

The agenda for day 3 was packed with two keynote speeches, several talks, poster presentation, and conference dinner.

Keynote Speech

The first keynote speech was The Grammar of Animation by Thomas Lin Pedersen (video, slides). In his speech, Pedersen explains that visualization is an element that falls somewhere between three dimensions of DataViz nirvana, which are static, interactive, and animated. Each dimension has its own pros and cons. Mara Averick's tweet below gives us a clearer illustration of this.
Pedersen implements this grammar concept by rewriting the gganimate package which extends the ggplot2 package to include the description of animation such as transition, view, and shadow. He made his presentation even more engaging by showing an example that channels Hans Rosling's 200 Countries, 200 Years, 4 Minutes visualization. The example is made by utilizing the transition_time() function in the gganimate package.

The second keynote speech was Adventures with R: Two Stories of Analyses and a New Perspective on Data by Bill Venables. He discussed two recent analyses, one from psycholinguistics and the other from fisheries, that show the versatility of R to tackle the full range of challenges facing the statistician/modeler adventurer. He also made a comparison between Statistics and Data Science and how they relate to each other. The emerging data science is not natural a successor of Statistics. There are some subtle differences between them. Professor Venables said that both sides are important domains and connected, but we have to think of them as essentially bifurcating to some extent and not taking on each other's roles. Things work best when domain expert and analyst work hand in hand.
Professor Venables ended his speech by mentioning two quotes that I would like to requote here:

"The relationship between Mathematics and Statistics is like that between chemistry and winemaking. You can bring as much chemistry as you can to winemaking, but it takes more than chemistry to make a drinkable dry red wine." 

"Everyone here is smart, distinguish yourself by being kind."


There was a tribute to Bill Venables at the end of the event.

The Talk Sessions

There are 18 parallel sessions of talks conducted from 10.30 am to 4.50 pm. Those sessions were held in three parts, where each part are separated by two tea breaks and one lunch break. I managed to attend eight talks that covered topics of data handling and visualization.
  1. Statistical Inference: A Tidy Approach using R by Chester Ismay.
    Chester Ismay from DataCamp introduces the infer package which was created to implement common classical inferential techniques in a tidyverse-friendly framework that is expressive of the underlying procedure. There are four main objectives of this package:
    1. Dataframe in, dataframe out
    2. Compose tests and intervals with pipes
    3. Unite computational and approximation methods
    4. Reading a chain of infer code should describe the inferential procedure
  2. Data Preprocessing using Recipes by Max Kuhn.
    Max Kuhn of RStudio gives a talk about the recipes package which aims for predictive data modeling. Recipes works in three steps (recipe → prepare → bake):
    1. Create a recipe, which is the blueprint of how your data will be processed. No data has been modified at this point.
    2. Prepare the recipe using the training set. 
    3. Bake the training set and the test set. At this step, the actual modification will take place.
  3. Build Scalable Shiny Applications for Employee Attrition Prediction on Azure Cloud by Le Zhang
    Le Zhang of Microsoft delivers a talk about building a model for employee attrition prediction and deploy the analytical solution as Shiny-based web service on Azure cloud. The project is available on GitHub.
  4. Moving from Prototype to Production in R: A Look Inside the Machine Learning Infrastructure at Netflix by Bryan Galvin
    Bryan Galvin of Netflix gave the audience a look inside the machine learning infrastructure at Netflix. Galvin explained briefly on how Netflix moves to production using microframework named Metaflow and R. Here's the link to the slides.
  5. Rjs: Going Hand in Hand with Javascript by Jackson Kwok
    rjs is a package that is designed is designed for utilizing JavaScript's visualization libraries and R's modeling packages to build tailor-made interactive apps. I think this package is super cool and it was an absolute highlight for me at useR! 2018. I will definitely spend some time to learn this package. Below is an example of rjs implementation. Check the complete project on GitHub.
  6. Shiny meets Electron: Turn your Shiny App into a Standalone Desktop App in No Time by Katie Sasso
    Katie Sasso of Columbus Collaboratory shares how the Columbus Collaboratory team overcame the barriers of using Shiny for large enterprise consulting by coupling R Portable and Electron. The result is a Shiny app in a stand-alone executable format. The details of her presentation along with the source code and tutorial video are available on her GitHub.
  7. Combining R and Python with GraalVM by Stepan Sindelar
    Stepan Sindelar of Oracle Labs told us how to combine R and Python into a polyglot application which is running on
    GraalVM.  GraalVM enables us to operate on the same data without the need to copy the data when crossing language boundaries.
  8. Large Scale Data Visualization with Deck.gl and Shiny by Ian Hansel.
    Ian Hansel of Verge Labs talked about how to integrate deck.gl, a web data visualization framework released by Uber, with Shiny using the R package deckard.
Conference Dinner

The conference dinner ticket

The conference dinner can only be attended by people who have the ticket only. I was fortunate because as a scholarship recipient, I got a free ticket for the dinner (again, thank you, useR! 2018 and R-Ladies Melbourne). There was a trivia quiz at the end of the dinner. All attendees are grouped based on the table they were sitting at and must team up to answer all the questions on the question sheets. The solution for the quiz can be found here. The teams who won the quiz got free books as the prizes.

The conference dinner and the trivia quiz
Day 4 (July 13, 2018): Keynote Speech, Talk, and Closing Ceremony

Keynote Speech

The last day of the conference starts with a keynote speech Teaching R to New Users: From tapply to Tidyverse by Roger Peng. In his talk, Dr. Peng talked about teaching R and selling R to new users. It could be difficult to describe the value proposition of R to someone who had never seen it before. Is it an interactive system for data analysis or is it a sophisticated programming language for software developers? To answer this, Dr. Peng quote a remark from John Chambers (one of the creators of the S language):

"The ambiguity [of the S language] is real and goes to a key objective: we wanted users to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs became clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important."

I think this is the beauty of R that attracts me. I do not have to jump into the developing things directly, but instead gradually transitioning myself into the programming. To sum up, Dr. Peng shares the keywords that could be useful in selling R to new users: free, open source, graphics, reproducibility - reporting - automation, R packages + community, RStudio, transferability skills, and jobs ($$).

Some tips for selling R by Dr. Roger Peng
The second keynote speech was R for Psychological Science by Danielle Navarro (video, slides). Dr. Navarro shared her experience in teaching R for psychology students. Fear apparently is the main challenge that prevents students from learning. She also talked about the difficulty she faced to find a good textbook to use in her class that finally lead her to write her own lecture notes. Her lecture notes tried to address student fears by using a relaxed style. This works well for her that she ended up having her own book and won a teaching award. Dr. Navarro ended her talk by encouraging everyone to conquer their fears and climb the mountain of R. It might not be easy to avoid the 'dragon' at the top, but there are always people who will support and help us. Reminds our community that we are stronger when we are kind to each other.
The third and the last keynote was Code Smells and Feels by Jenny Bryan. She shared some tips and tricks on how to write codes elegantly in a way that it is easier to understand and cheaper to modify. Some code smells apparently have official names such as Primitive Obsession and Inappropriate Intimacy.
Here are some tips that I summarize from her talk:
  1. Write simple conditions
  2. Use helper functions
  3. Handle class properly
  4. Return and exit early
  5. Use polymorphism
  6. Use switch() if you need to dispatch different logic based on a string.
Besides the three great keynotes above, I also attended several short talks:
  1. Tidy forecasting in R by Rob Hyndman
  2. jstor: An R Package for Analysing Scientific Articles by Thomas Klebel
  3. What is in a name? 20 Years of R Release Management by Peter Dalgaard
  4. Sustainability Community Investment in Action - A Look at Some of the R Consortium Funded Grant Projects and Working Groups by Joseph Rickert
  5. What We are Doing in the R Consortium Funds by various funded researchers

Closing Ceremony

The closing speech was delivered by Professor Di Cook from the Department of Econometrics and Business statistic at Monash University. There was also a  small handover ceremony between Di Cook and Nathalie Vialaneix who will organize next year's useR! 2019 in Toulouse, France.
At the end of the ceremony, there was an announcement for the winners of hexwall photo contest which are chosen randomly.
It was indeed a delightful experience for me. I am happy and went home with a list of homework and new packages that I have to learn. For those who did not make it to the useR! 2018 Conference, do not feel FOMO. All talks and keynote speech are posted online on R Consortium's youtube account.

I would like to thank Professor Di Cook of Monash University as well as R-Ladies Melbourne for giving me a scholarship and make it possible for me to attend this conference. I also would like to congratulate all useR! 2018 organizing committee for the great and brilliant efforts to make this event a great success. I really look for joining next year's useR! 2019 which will be held from July 9 - 12, 2019, in Toulouse, France. So, do not miss the updates. Check its website as well as follow the twitter account @UseR2019_Conf with hashtag #useR2019.

@erikaris

2018-09-03: Let's compare memento damage measures!

It is always nice getting a Google Scholar alert that one of my papers has been cited. In this case, I learned that the paper "Reproducible Web Corpora: Interactive Archiving with Automatic Quality Assessment" (to appear in the ACM Journal of Data and Information Quality) cited a paper that I wrote during my doctoral studies with fellow PhD students Mat Kelly and Hany SalahEldeen and our advisors Michael Nelson and Michele Weigle. More specifically, the Reproducible Web Corpora paper (by Johannes Kiesel, Florian Kneist, Milad Alshomary, Benno Stein, Matthias Hagen, and Martin Potthast) is a very important and well-executed follow on to our paper "Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources" (a best student paper from JCDL2014 and subsequently published in the International Journal of Digital Libraries).

In this blog post, I will be providing a quick recap and analysis of the Kiesel paper from the perspective of an author of the paper that provides the Brunelle15 metric used as the benchmark measure in the Kiesel 2018 paper.

Sunday, September 2, 2018

2018-09-02: Sampath Jayarathna (Assistant Professor, Computer Science)

I am really excited to be part of the Old Dominion University and the WS-DL group. I joined the faculty at Old Dominion University in 2018. Before that, I was a tenure-track assistant professor for two years at California State Polytechnic University (Cal Poly Pomona). I am truly grateful to Frank Shipman, Oleg Komogortsev, Richard Furuta, Dilma Da Silva and Cecelia Aragon for the help throughout this faculty search. It is sad to say goodbye to my colleagues at Cal Poly but I am excited to have an amazing bunch of mentors and colleagues here at ODU, Michael, Michele, Nikos, Ravi, Jian, Cong, Shubham, Anne and many more. Its truly amazing that I was able team up and put-together 2 NSF proposals (CRII and REU Site) within a short period of time.

I received my Ph.D. in Computer Science from Texas A&M University in 2016, advised by Frank Shipman. I was a member of the Center for the Study of the Digital Libraries (CSDL) group. In 2012, I did a 6 month internship at Knowledge Based Systems Inc., College Station, TX to build a Collaborative Analysis tool for JackalFish enterprise search tool. I earned MS degree from Texas State University-San Marcos in 2010. I worked with Oleg Komogortsev in the area of Oculomotor Systems research, eye tracking and Biomertrics using eye movements. I spent the summer 2009 at Lawrence Berkeley National Lab and with Cecilia Aragon (currently professor at UW Seattle) and Deb Agarwal on a very cool eye movement based biometric project.

My undergraduate degree is a B.S in computer Science (First Class Honors, similar to Latin honor summa cum laude) from University of Peradeniya, Sri Lanka in 2006. 
I am an avid gardener, my wife says I have a “green thumb”, something to do with coming from a tropical island. Most of my plants did not survive the 20 days west to east coast journey. 

Sri Lankan "King Coconut"


I grow variety of vegetables including tomato, water melon, leafy greens, chilies, and some exotic tropical fruits and vegies. It is exciting to see what I can do with long hot summers and 4-season weather. 

My academic Genealogy, Bucket List, Goodreads Bookshelf, YouTube playlist, IMDB lists of favorite TV-shows, and Movies.

--Sampath