2025-01-06: Data Visualization Class Projects - Fall 2024

It's been a couple years since I last posted on my visualization students' projects, but I wanted to continue the tradition with highlights from my most recent class. (Previous semester's posts are available via the VisProjects label on the blog.)

Fall 2024 was the first time that I taught CS 625 Data Visualization in a fully asynchronous mode. During summer and fall, I recorded supplemental videos to go along with Dr. Tamara Munzner's Visualization Analysis and Design textbook and YouTube lectures. Most of the students in the class were working full-time while taking courses towards their Masters in Computer Science or Data Science at ODU. It was great experience, and I plan to continue teaching this course asynchronously every year (we have other faculty teaching the course in person).

The final project asks students to create static explanatory visualization (i.e., a single chart or group of charts presented as a faceted chart) that reveals something interesting. Students created a demo video that walked through and explained their final visualization. I'm highlighting three of top visualizations here.

Higher Batting Averages Lead to Higher Win Percentages (MLB)

Created by David Lambertson


David used the Seaborn library in Python to create a scatterplot that looked at Major League Baseball (MLB) batting averages versus the team's winning percentage between 2020-2024, based on data from Baseball-Reference.com. He used color hue and shape to indicate if the team made the playoffs (blue dot) or not (orange x).

David's chart reveals a positive correlation: teams with higher batting averages generally achieved higher winning percentages. Further, the analysis highlighted that playoff teams typically had batting averages exceeding 0.24. However, David acknowledged that batting average is not the only factor determining a team's success, and pointed to examples like the 2020 Cincinnati Reds, who made the playoffs despite a low batting average. He concluded that while a strong offense, represented by a high batting average, is valuable, other factors such as pitching and defense also contribute to a team's overall performance. 

David made good use of color and annotations to highlight outliers in his chart. He added a trend line to emphasize the division between non-playoff teams and playoff teams. He also used the chart title to effectively make his main point.

COVID Adjustments to Diploma Requirements Increased Advanced Regents' Diplomas

Created by June Troyer

June created a horizontal stacked bar chart in Vega-Lite to show differences in high school diploma types in New York City earned before and after COVID. She used data from the City of New York covering graduation rates from 2016 to 2023.

June focused on the effects of Regents Exam exemptions during the pandemic. Her horizontal stacked bar chart revealed that the policy led to an increase in Advanced Regents diplomas in 2020, 2021, and 2022, without significantly impacting overall graduation rates. This suggests that the policy primarily benefited students who would have otherwise earned a lower-level diploma. June found this result after initially exploring the impact of COVID on various demographic subgroups, which did not reveal any significant trends. 

June effectively used color to differentiate between the types of degrees, wrote a title that served as a headline for her chart, and added annotations to indicate the years when the Regent's Exam requirement was waived during COVID.


Obesity Rises with Poverty and Lower Income More Frequently in the South

Created by La'Tisa Ward


La'Tisa used Tableau to create an annotated scatterplot that showed how obesity rates correlate with US state poverty rates and median household incomes in 2022. She used color hue to group the states into four geographic regions (Midwest, Northeast, South, and West). Her project used data from the 2022 Census and the Trust for America's Health 2023 Obesity Report.

La'Tisa's analysis reveals a strong correlation between higher obesity rates and lower income and higher poverty levels, particularly in Southern states. Her report also include other types of visualizations, including a choropleth map and bar charts that were part of her analysis. She was alarmed to find that 88% of states reported obesity rates above 30% in 2022. 

La'Tisa effectively used color to highlight the differences among regions of the country, which made possible the observation that states in the South had higher obesity and poverty rates. She also added text annotations to provide context on some of the outlier states: New Mexico, Vermont, and Delaware. Like David and June, her title was effective in emphasizing the main finding of her chart.

Summary

All three of these projects highlight the use of complex data sources to identify and communicate an interesting finding in a single chart.

-Michele

Comments