Posts

Showing posts with the label Multi-Rater Agreement

2025-03-26: A Battle of Opinions: Tools vs. Humans (and Humans vs. Humans) in Sentiment Analysis

Image
  Introduction We analyzed the sentiment of 100 tweets using three sentiment analysis tools ( TextBlob , VADER , and a RoBERTa-base model ) and six human raters. To measure agreement, we calculated Cohen’s Kappa for each pair of raters (including both humans and tools) and Fleiss’ Kappa for multiple raters. The results? Let’s just say consensus was hard to find. Even the human raters struggled to agree, so we took a majority vote among them and compared it with the tools. Notably, the RoBERTa-base model showed the best alignment with human rating. Our dataset consists of 100 tweets collected using the keyword “Site C, Khayelitsha” to study residents' perceptions of safety and security in Khayelitsha Township, South Africa , as part of the Minerva Research Initiative Grant awarded by the U.S. Department of Defense in 2022 . We then collected the sentiment labels by running the sentiment analysis tools listed below and also by gathering data from six human raters. This data is ava...