2023-03-29: A Summary of Misinformation & Disinformation Workshop at Old Dominion University

Old Dominion University library organized a workshop on Misinformation and Disinformation (see video and presentation slides added in this blog) on Feb 23, 2023. It was two hours workshop session led by six professors, including Dr. Jian Wu, Dr. Sampath Jayarathna, Dr. Michele Weigle, Dr. Faryaneh Poursardar, Dr. Vikas Ashok, and Dr. Yi He from the department of Computer Science (CS). The workshop had more than 50 participants, including online and in-person. Each professor presented their research on Misinformation and Disinformation during this workshop. This blog will briefly summarize the workshop on various research projects on this topic.

Misinformation and Disinformation Workshop
(Credit: Himarsha R. Jayanetti)


Video: Misinformation / Disinformation

(Source: https://odumedia.mediaspace.kaltura.com/media/Misinformation+Disinformation+Panel+/1_0hgps6aa)

Presentation Topics

  • Research on Fake News -- presented by Dr. Jian Wu, Assistant Professor, CS.
  • Research Experiences for Undergraduates (REU) program on Disinformation Detection and Analytics -- presented by Dr. Sampath Jayarathna, Assistant Professor, CS.
  • Detecting Review Manipulations -- presented by Dr. Faryaneh Poursardar, Assistant Professor, CS.
  • Exploring Banned Instagram Accounts using Web Archives -- presented by Dr. Michele C. Weigle, Professor, CS
  • Can Blind People Easily Identify Deceptive Web Content with Present Assistive Technologies? -- presented by Dr. Vikas Ashok, Assistant Professor, CS.
  • Will Hallucination in ChatGPT pollute Public Knowledgebase? -- presented by Dr. Yi He, Assistant Professor, CS

Research on Fake News

Dr. Jian Wu broadly talked about Fake News (deceptive content presented under the guise of legitimate journalism) -- a worldwide information accuracy and integrity problem. He introduced seven types of Mis-and-Disinformation, including parody, misleading content, imposter content, fabricated content, false connection, false context, and manipulated content. He shared statistics, which show about 10% of top news is generated by social media such as Twitter and Facebook, containing 42% fake news. To identify fake news, Dr. Wu introduced three approaches -- human-only (check by fact-checkers, e.g., 80+ fake news websites listed in Wikipedia, Google Image Search, and TinEye), machine-only (uses artificial intelligence (AI)), and human-plus machine-only (first machine identifies potential fake news and then verified by a human). Furthermore, Dr. Wu described methods of AI to detect fake news. For example, Fake news can be data perspective, e.g., news content (knowledge-based, style-based) and social context (stance based, propagation-based). He also talked about fake news detection from the method's perspective, e.g., Machine Learning based (e.g., feature engineering, training classifier) and Database approaches (e.g., fact-checking). To conclude his talk, he mentioned most fake news detection follows database approaches; however, with recent advancements in AI, there is still room for research to detect fake news using machine learning approaches.

Research Experiences for Undergraduates (REU) program on Disinformation Detection and Analytics

Dr. Sampath Jayarathna briefly introduced the summer Research Experiences for Undergraduates (REU) program funded by the National Science Foundation (NSF) on Disinformation Detection and Analytics at ODU. Through this program, NSF provides grants to various universities to mainly support undergraduate students in meaningful research experiences. He provided a website link for the students to search for active REU Site programs. The main goal of ODU's REU Site is to engage participating students in real-world projects studying disinformation from the perspective of data analytics, information retrieval, applied machine learning, web archiving, and social computing. Dr. Jayrathna also talked about the qualifications and benefits of this program. For example, the qualifications are -- US citizen, must be an undergraduate student enrolled in a degree program, ideal programming experience, and the benefits are -- $6,000 stipend with free housing, free meal, and up to $600 in travel support to Norfolk. Moreover, he talked about the necessary information to apply for the REU Site program, the evaluation process, and REU Site Mentors. Further, Dr. Jayarathna concluded his talk by introducing mentees in the summer of 2022 and their REU projects.

Detecting Review Manipulations

Dr. Faryaneh Poursardar introduced crowdsource-based online platforms such as Amazon, IMDB, and YELP, where user provide their reviews. She emphasized the importance of these reviews as users rely on them to make decisions. However, these reviews can be very vulnerable to manipulation. For example, some companies will pay people to write reviews. So, these reviews are fake and fraudulent, leading to credibility issues in the online platform. Thus, the goal is to seek strategies to detect manipulated reviews that can uncover fraudulent reviewers by investigating the length of the review, verification of the account, similarity check, overly positive, and burstiness of the review. Moreover, she emphasized the impact of fake reviews. For example, fake internet reviews have a $152 billion direct impact on worldwide online purchases. Dr. Poursardar further talked about the characteristics of these fake reviews. For instance, the review can be too short or a reviewer can receive a free product in exchange for writing a review or misspellings or the same person wrote several reviews in a single day. To resolve this issue, Dr. Poursardar introduced the project on review manipulation led by one of the REU students in the summer of 2022. The research task involved understanding and identifying the critical features, building different classifiers (e.g., regression-based models and two different deep learning-based models such as Deep Convolutional Neural Network and bidirectional-LSTM (long short-term memory)), training the model using two different datasets (e.g., Amazon Reviews (reviews ranging from 1996 to 2018 and YELP), and comparing the results. Finally, Dr. Poursardar concluded the talk by discussing the results and the critical features of review manipulation that can impact online communities and decision-making.

Presentation: How Are Misinformation and Disinformation Related to You?

Exploring Banned Instagram Accounts using Web Archives

Dr. Michele C. Weigle began her talk particularly with the challenges of exploring banned Instagram accounts using Web Archives. One of her Ph.D. students conducted this research, but later an REU student in the summer of 2022 explored more about this research and generated the results. The motivation to conduct this research is that during the presidential election in 2016, Russian Internet Research Agency (IRA) manipulated public opinions and thoughts regarding the election using social media. Later, a year after, the US Senate Select Committee on Intelligence did an analysis and discovered that the IRA employed 1000+ people with a budget of over $25 million to influence people and potentially affect the outcome of the presidential election. Interestingly, people knew there were many fake posts on Facebook and Twitter, but the intelligence committee uncovered that much of the manipulation was done on Instagram. For example, a larger share of the engagements of fake posts generated by the IRA was done on Instagram (e.g., total engagements of 187 million on Instagram). Minimizing such false information relies on media literacy and educating the public about disinformation and misinformation tactics. For example, one of the ideas that the IRA used was to expand their communities and build public trust. Then, they slowly inject misinformation, so the public does not believe the information from the government. Thus, the research project aims to study the users who provide this misinformation and post on Instagram. However, the challenge is that Instagram needs to be studied as Twitter or Facebook. Due to the following reasons, collecting Instagram data to study is challenging.
  • Instagram has a lot more active users than Twitter.
  • Twitter API allows searching and downloading public tweets, but Instagram is the opposite.
  • Instagram has no native sharing features, but Twitter has it based on retweets.
  • On Instagram, public posts and accounts are limited for not logged-in users and bots, whereas Twitter has the opposite feature.
Dr. Weigle provided a few more examples showing Facebook and Instagram tried to ban accounts that spread misinformation. For example, Instagram has banned 12 out of 10 accounts of the "Disinformation Dozen," which relates to most vaccine hoaxes on social media. Banning these accounts means that researchers can not study these posts by visiting the live web. The only option is to analyze such data using Web Archives, which can access the past version of webpages. So, if those banned Instagram account pages were archived before they were banned, it is possible to study the content and learn more about the tactics for spreading misinformation. Also, Dr. Weigle introduced the Internet Archives (IA) Wayback Machine, which can replay the past version of webpages. Dr. Weigle concluded her presentation by sharing the results of capturing the mementos using the IA and challenges related to Instagram (e.g., hard to archive) using the example of banned accounts.

Can Blind People Easily Identify Deceptive Web Content with Present Assistive Technologies?

Dr. Vikas Ashok began his presentation with how deceptive content can affect blind people. Generally, the definition of deception or misinformation is slightly different for blind people when comparing it to sighted people. The problem arises when the way blind people interact with computers and web pages differs from that of sighted people. During the presentation, Dr. Ashok described the importance of this problem. For example, there are one million blind people in the US and 49.1 million worldwide. So, for these blind people to interact with web pages,  they use ''screen readers''. With screen readers, the content on web pages will be read aloud using a synthesized voice. Blind users can hear through speakers or headphones and navigate the content using keyboard shortcuts. Although screen readers have some advantages, accessibility, and usability are still significant problems. For example, when interacting with web pages, blind people may experience poor web structure since many websites have been designed for sighted interaction, keyboard-only navigation, inaccessible images, unclear link text, lack of feedback, tedious and frustrating content navigation, and misleading or deceptive content. Dr. Ashok implied that among these problems, misleading or deceptive content problems on web pages received less attention. To describe this problem, Dr. Ashok played a demo video that showed that blind people can easily get deceived because they can not see and only hear the contents. This is something that sighted people can easily avoid. For example, a web page may contain irrelevant advertisements or malware,  which they can easily avoid. However, blind people have to listen to everything about the ad to determine whether it is irrelevant content. Sometimes they can easily be deceived since they use keyboard shortcuts to click the content and do not hear it is a virus or malware. They can easily download and install viruses on the computer. Many deceptive contents can be found on web pages, including fraudulent online advertisements, phishing websites, social media posts with false news, and clickbait and promotions. Dr. Ashok provided many examples to emphasize the problem as it is significant and needs more attention for further research. Lastly, he introduced assistive technologies using AI techniques to identify misleading content, discussed the challenges, and concluded his talks with a few remarks.

Will Hallucination in ChatGPT pollute Public Knowledgebase?

Dr. Yi He gave a talk on ChatGPT, which may pollute public knowledge bases. Dr. He introduced around 60 years of history about chatbots, but the question is, what is the difference between these chatbots and ChatGPT? It may have gotten popular because of the famous GPT (Generative Pretrained Transformer) model in AI. Dr. He explored more about ChatGPT while running a new initiative for security to deploy a ChatGPT competitor on their welfare informative forum for the customers to interact with chatbots. To talk more about ChatGPT is AI-powered, and it learns from the data without expert help. It conveys multiple examples, capabilities, and limitations. ChatGPT is trained on a 570 GB corpus of text data that includes existing literature, websites, Wikipedia, and all the online forums. It utilizes the GPT transformer with 175 billion parameters for training. Enabling such training costs OpenAI about 12 million dollars to use resources. If we want to build a cost-effective AI system, we would be hesitant to make such a system as a researcher or scientist with a limited budget. ChatGPT can perform multi-round dialogue and information retrieval, document summarization, multilingual translation, writing essays, and coding. ChatGPT will also provide a contextual dialogue while asking a question in a more formatted version, and the version (GPT-3) will not provide answers to any malicious inputs. One of the ethical issues is that asking ChatGPT to write codes and essays will provide output that may harm academic integrity since students will tend to copy essays and codes without actually learning them. Although ChatGPT has these remarkable capabilities, to criticize whether it will pollute public knowledgebase, Dr. He explained the hallucination of ChatGPT. He mentioned that ChatGPT-generated text from language models is nonsensical or unfaithful to ground truth. For example, while asking ChatGPT to provide code for decision trees for classification using python from scratch without using any existing libraries, ChatGPT provided the code. Still, it failed to compile while running it in the compiler. So as a student, if we are not careful, we will most likely be in trouble for copying the code in the assignments without understanding it correctly. Dr. He provided some examples of hallucination, such as when asking ChatGPT to balance a chemical equation, ChatGPT provided the answer with steps. However, it made mistakes in balancing the chemical equation. Even asking to provide references for academic papers in a medical domain, ChatGPT produced fake references. These hallucinations happen because the transformer model remembers what it has learned and produces the result by calculating the probability of the most possible combination of the words. So, it has a low probability of logical reasoning on domain knowledge and no access to the current research frontier. Moreover, ChatGPT is learning the contents from the internet. Then the question is, what is the probability that the answers it produces from the internet are not misinformation or fabricated? Even ChatGPT itself creates fabricated answers which leads to polluting the knowledgebase. To overcome these problems, Dr. He mentioned understanding the distinction between human-generated or AI-generated text. To conclude the talk, Dr. He further suggested not relying on the outputs of chatbots and actively searching for credible data sources to prevent misinformation.

Acknowledgment

I thank the ODU library, especially Lucinda Wittkower, Dr. Jennifer Hoyt, and Alisa Moore, for arranging this workshop. Also, I would like to acknowledge Kehinde Ajayi, Danielle Bertulfo, and Tiffany Whitfield for their effort during this workshop. Further, thanks to all the panelists in this workshop for presenting misinformation or disinformation from their research perspective and providing us the solutions for identifying misinformation or disinformation in the social media platform, online forums, and public knowledgebase. 

-- Muntabir Choudhury (@TasinChoudhury)

Comments