Web Science and Digital Libraries Research Group

Posts

Showing posts with the label NER

2022-09-29: Theory Entity Extraction for Social and Behavioral Sciences Papers using Distant Supervision

By Xin - September 29, 2022

In this blog, I will talk about our recent paper " Theory Entity Extraction for Social and Behavioral Sciences Papers using Distant Supervision ", which is published in the conference DocEng . In this paper, we proposed an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The code and data are publicly available in GitHub . You can also check the slides. Introduction Scientific literature has grown exponentially over the past decades . In order to understand the literature more quickly, people can review abstracts and high-level key phrases. But they don't provide enough details. Theories and models extracted from body te...

2022-02-25: Evaluating MAN, the Tool that Utilizes Google Translate to Normalize Arabic Names' Transliterations in Cross-Language Information Retrieval

By Hussam Hallak - February 25, 2022

Introduction: The increased use of Natural Language Processing (NLP) techniques is fueled by the need to process massive amounts of data, the demand for clever chat bots, and other human-computer interaction tasks. Named Entity Recognition (NER) is one of the most important techniques in NLP . The extracted named entities offer computers a way to classify documents, perform semantic analysis on textual information, etc. In other words, NLP allows machines to understand human language(s). Speaking of languages, Cross-Language Information Retrieval (CLIR) gained traction in the past two decades or so due to the unprecedented rise in globalization, transnational companies, international news outlets, social media, and internet use. CLIR requires a translation service since CLIR deals with retrieving information written in languages different from the language of the user's query. In August 2020, I proposed an approach for extracting named entities from Arabic tex...

2022-01-27: MAN, A New Tool For Normalizing Transliterations of Arabic Named Entities in Cross-Language Information Retrieval

By Hussam Hallak - January 27, 2022

Natural Language Processing (NLP): The recent advancement in Natural Language Processing (NLP) has allowed machines to process massive amounts of data found on the internet and elsewhere. The data revolution isn’t only about numbers because, in addition to numbers, data include words, images, videos, etc. Therefore, researchers are working on teaching machines how to process natural languages to interact with humans, summarize data, extract information, etc. The fact that machines now have to interpret human languages opens the door for new opportunities for NLP software that facilitate interactions between humans and computers. Named Entity Recognition and Classification (NERC) is one of the most important techniques in NLP . Names of persons, locations, and organizations extracted from a document enable computers to understand the content of the document. Cross-Language Information Retrieval (CLIR): The importance of Cross-language information retrieval (CLIR) comes from t...