2019-06-03: Metadata on Datasets Saves You Time

When I joined ODU this Spring 2019, I explored datasets in digital libraries with the hope of discovering ways to enable users to discover data, and for data to find its ways to users as my first task. This led to some interesting findings that I will elaborate in this post. First things first, let's take a look at what tools and platforms are available that attempt to make things easier for users to find and visualize data. A quick Google Search provided a link to this awesome GitHub repository which contains a list of topic-centric public dataset repositories. This collection proved useful to gather the types of dataset descriptions available at present. The first dataset collection I explored was Kaggle. Here, the most upvoted dataset (as of May 31, 2019) was a CSV file with the topic "Credit Card Fraud Detection". Taking a quick look at the data, the first two columns provides a textual description of the content, but not the rest. Since I'm not the mai