Showing posts from September, 2022

2022-09-29: Theory Entity Extraction for Social and Behavioral Sciences Papers using Distant Supervision

  In this blog, I will talk about our recent paper " Theory Entity Extraction for Social and Behavioral Sciences Papers using Distant Supervision ", which is published in the conference  DocEng .  In this paper, we proposed an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The code and data are publicly available in GitHub . You can also check the slides. Introduction Scientific literature has grown exponentially over the past decades . In order to understand the literature more quickly, people can review abstracts and high-level key phrases. But they don't provide enough details. Theories and models extracted from body text can provide more details. While  abstracts and

2022-09-28: Using Web Archives in Disinformation Research

Figure: Example of content label added by the Internet Archive Following on Lesley Frew's post looking at how journalists use web archives , I wanted to highlight some of the ways that web archives have been used to study disinformation over the past few years and to bring together some of the work that our Web Science and Digital Libraries (WS-DL) research group has done in this area. My interest in the intersection of web archives and disinformation has largely stemmed from developing a Disinformation lecture for our CS 432/532 Web Science course . Much of my reading was seeded by the work of Kate Starbird (starting with her ICWSM 2017 paper ) and Amelia Acker (notably, her 2018 Data Craft report ). My colleague Michael Nelson has built a graduate course, Web Archiving Forensics , around some of the topics discussed here. Figure: Mentions of "Wayback Machine" in news stories Webpages and social media posts can be modified or deleted or authors can be banned from t

2022-09-15: Querying the Politwoops Search Engine for Deleted Tweets

             As a part of the ODU Research Experience for Undergraduates (REU) site in disinformation detection and analytics, we began research into querying certain sites for evidence of correct attribution of a tweet as a part of the “ Did They Really Say That? ” project. One site that proved to be particularly useful for this project was Politwoops , a project created by Propublica which serves as an archive for deleted tweets made by accounts belonging to political officials. Politwoops only tracks the accounts of candidates and current elected officials.  Since this project focuses on verifying attribution for social media posts, Politwoops serves as a particularly fruitful source of evidence for us. For one, the existence of a tweet on Politwoops serves as direct evidence that a tweet was actually made. It is also worth noting that even though Politwoops is limited to tracking only political officials, it still serves a valuable purpose for us because these types of public figu