2021-09-15: Data Science High School Summer Camp 2021



Data Science is becoming the lingua franca of the 21st century as there are more data sources today than ever before. To prepare the younger generation for the world of data science and artificial intelligence, Dr. Sampath Jayarathna at Old Dominion University (ODU) in collaboration with the ODU Computer Science (CompSci) Department organized a 10-day data science summer camp for about 14 high school students in Norfolk, Virginia from August 9th to August 20th, 2021, funded by PRA Group Inc. The program was a hybrid of onsite and virtual training sessions.

This program was an intensive training program intended to prepare high school students for working with different data sources such as structured and unstructured data like text data and images. The students also learned Python programming which is currently the most popular programming language for data science. During each session, we gave the students some activities to work on based on the topics covered since the training is more hands-on than theoretical.


The first two days of the summer camp were conducted on campus. On the first day, the students were introduced to the basics of Python programming by Dr. Jayarathna. They had the opportunity to learn about Python variables, strings, lists, and functions. On the first half of the second day of the summer camp, the students were introduced to conditional statements in Python programming. They learned about “if-elif-else” and how to implement it to solve real-life problems.



During the second half of the second day, the students were introduced to Numpy (Numerical Python); one of the most fundamental data science libraries. They learned about NumPy arrays, and how to use arrays for data science projects.

After the second day, the program continued virtually for five days.  It was amazing how the students were still able to grasp these high-level concepts via online meetings as evident in their engagements in the class activities. They began the virtual class with another fundamental data science library named pandas. pandas is a Python data analysis software popularly used by most data scientists and data analysts for manipulating and analyzing structured data. In this session, the students were taught how to create their own data in table format. They also learned how to load data in different formats both from their local computer and from online sites such as GitHub.

 

The pandas session concluded with data understanding and data manipulations where the students investigated different kinds of data used to answer some business questions via the skills learned during the session.

During day 4 of week 1, the students were introduced to data visualization using seaborn. Seaborn is a Python data visualization library based on matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics. In my opinion, data visualization is one of the most sought-after skills every data analysis enthusiast craves for. This is because data visualization helps to discover hidden insights about a given data.

In this session, the students learned how to create various visualizations with structured data. They learned how to create visualizations for univariate data such as bar graphs for a single categorical variable and histogram for a single quantitative variable. They also learned how to create scatter plots for two numeric variables to study the relationships between the two variables. This session was extremely insightful for the students to see how to use visualization to tell stories and share insights about the data.

 

The students had a very interesting final day of week 1. Day 5 started with a talk by the invited speaker Dr. Wu of WS-DL about Natural Language Processing and "Training Computers to Understand Humans".

After lunch, the students had the opportunity to listen to the inspiring talk by another invited speaker Dr. Meghan Chandarana about her journey of becoming a NASA engineer. The high school students were also introduced to data storage topics and LaTeX using Overleaf on the same day. 


The second week of the summer camp started with the topic of data wrangling by Dr. Jayarathna. In this session, he introduced how to perform data cleaning with Python pandas. He also introduced the concept of machine learning and the various processes involved in building a simple machine learning model.

 

During the second week, the students were also taught how to use Weka (machine learning software) to build machine learning models and design workflows using different kinds of data. There were also research presentations on Eye tracking from some PhD students in Web Science and Digital Libraries group (WS-DL) at ODU; Yasith Jayawardana, Gavindya, and Bhanuka Mahanama


Also, the second week was fueled by Dr. Sawood Alam, an alumni of the Web Science and Digital Libraries group (WS-DL) at the Old Dominion University (ODU). During his presentation, he shared his experiences as a Web and Data Scientist at the Internet Archive.

 

In addition to the regular lectures, students were teamed up to work on a project using the Google Colab environment. Students were given a list of 15 topics and datasets to choose from to design a project on Day 4 of week 2. They applied their coding and data science skills acquired during the camp to experiment with the chosen data on the last two days. On the final day, we had students teamed up in nine teams to present their projects.


Team 1: Trending YouTube Videos

Team 2: Predicting Prices Of Bitcoin

Team 3: Analysis Of Covid Cases By Counties

Team 4: Analysis Of Covid Cases By Counties

Team 5: Analysis Of Covid Cases By Counties

Team 6: A virus with no boundary: Visualizing Covid-19's impact on the World

Team 7: Analysis Of Covid Cases By Counties

Team 8: White Wine Samples

Team 9: Covid Death and Summarization


The students favored the topics based on ‘’Covid-19”, “Youtube videos”; and “wine samples”. Students enjoyed creating data visualizations such as using bar plots to show the view count of trending Youtube channels and covid cases per county. Some students even used advanced concepts like text analysis and Web scraping. It was impressive to see the development in coding skills and confidence of these high school students. Some students went from writing their first “hello world” program in Python to experimenting with datasets and creating visualizations.

 


By: Ajayi Kehinde Peter (@AjayiKehindep), Himarsha Jayanetti (@HimarshaJ), & Kritika Garg (@kritika_garg) 

ODU College of Science News article of this work is also available at: "Norfolk High School Students Unlock Potential in Data Science Summer Camp"

Comments