2022-01-19: Leveraging Google Translate for matching Arabic names written in English

Introduction: There is a significant amount of research papers and tools for Named-entity recognition (NER); however, only a small potion of it addresses Arabic text and even less tools for extracting named entities from documents written in Arabic. In August 2020, I proposed an approach for extracting named entities from Arabic text using a combination of tools, Google Translate and Stanford NERC , and produced comparable results to Arabic Linguistic Pipeline (ALP) . The implementation of my approach, GTS, is available on GitHub . In December 2020, I wrote a blog post outlining tools and libraries for matching Arabic names written in English , which is important for Entity Linking, a subtask of Natural Language Processing (NLP). While discussing the importance of Entity linking is beyond the scope of this post, merging Arabic named entities written in English is the first step for Entity Linking when processing English documents. This is because discrepancie...