2020-06-19: Data Visualization Fall 2019 Projects

(Previous semester Information Visualization highlights posts: Fall 2017, Spring 2017, Spring 2016, Spring 2015, Spring/Fall 2013, Fall 2012, Fall 2011)

In Fall 2019, I introduced CS 625: Data Visualization, a new graduate-level visualization course. (This course was taught in a flipped+hybrid manner, as I described in an earlier blog post.) We used the same textbook, Tamara Munzner's Visualization Analysis and Design, as in my previous CS 725/825 Information Visualization courses, but this course was designed to be a gentler introduction to visualization and data analysis. We focused on basic visualization design principles and on how to ask good questions rather than D3 programming. Students were allowed to use whatever tool they wished, but I emphasized clear design no matter what tool was used. Over the course of two assignments (HW7, HW8), students developed questions about real-world data, developed a draft visualization, and then refined the visualization based on feedback. I wanted to highlight three of the visualizations that were developed.

"How does race of the household owner affect household income?"
Created by Tina Heinich

This visualization explored interesting trends in American household income brackets using data from the US Census Bureau (source: https://www.census.gov/data/tables/2019/demo/income-poverty/p60-266.html).  Through her exploration of the data, Tina was surprised to discover that the percentage of Asian households making over $200k per year increased dramatically since 1987, especially compared to the percentages of White and Black households.

Tina used the principles of small multiples to compare the different aspects of the dataset.  Both the x and y axis ranges are the same across all three charts. Since the trends were the most important point, she used multiple line charts and highlighted the interesting attribute ("Over 200k") with a thicker line. She also wrote a headline that grabbed the reader's attention and expressed the main point of the chart. The subtitle explained the data shown in the charts, and she also included a link to the data source below the chart.

Tina used Excel to create this small multiples chart and ColorBrewer to help pick the colors.

"What is the impact of global warming on storm frequency over the years?"

To answer this question, Himarsha combined data from two datasets:
Himarsha used line charts with x-axes aligned to compare the number of tropical storms per year with the global temperature per year. To show the trend of hurricanes with different intensities over the years, she included small multiples of the number of storms in each hurricane category (Cat 1 weakest, Cat 5 strongest) along with a trend line. Finally, to make a direct comparison of number of tropical storms and global temperature, she included a scatterplot of those two variables along with a trend line that clearly shows positive correlation - as the temperature increases, so does the number of tropical storms. Himarsha's headline emphasized the main point of the visualization, and the subtitle provided additional detail.

Himarsha used R to create these charts and Data Color Picker to help choose the colors.

"Do the most popular genres of games have a correlation between having a high critical score and earning many sales?"
Created by Kenneth Diedrich

Kenneth investigated video game ratings, sales data, and release dates available on Kaggle that had been scraped from Metacritic and VGChartz (sources: https://www.kaggle.com/kendallgillies/video-game-sales-and-ratings and https://www.kaggle.com/rgwegwegwe/vgsaledata).

Kenneth focused on the three most popular genres (Action, Shooter, and Sports) based on the number of games released and only on games released between 2010-2015.  He built a scatterplot matrix broken down by genre (horizontal grouping) and ESRB rating (vertical grouping). Each chart showed the scores from critics along the x-axis and global lifetime sales on the y-axis. Since scatterplot matrices are essentially small multiples, the x and y axis ranges were the same across all of the charts. Kenneth did a nice job at using color to highlight the genre, using red for Action, blue for Shooter, and green for Sports. The trend lines showed that games in the Action category had better sales with higher critic ratings, which wasn't true across the board for the other genres. Kenneth wrote a headline that expressed the main finding in the visualization and a subtitle that explained the constraints on the data in the charts. He also included information about the sources of his data.

Kenneth used R to create this chart.