2021-12-31: Installing Several Open Source and Commercial Optical Character Recognition (OCR) Tools on a PC
Optical Character Recognition (OCR) tools are used for extracting text from images. There are many off-the-shelf OCR tools we can choose from. In a previous blogpost , I compared the performance of several open-source and commercial OCR tools. I’d like to go further and summarize the installation of these tools. In this blogpost, I will talk about the installation of Tesseract, Abbyy, Amazon Textract, and Google Cloud Vision. Tesseract: Tesseract is a free software package which accepts a wide range of file formats such as JEPG, PNG, TIFF, and BMP. The installation on a Win10 system is as follows: Step 1: Download tesseract executable file tesseract.exe from this website . Double click the file and it will guide you through installation. Step 2: Download the language package to the installation directory of the tesseract executable file. It must be compatible with the version of tesseract.exe . This is the language package for tesseract version 4.0 . You can download tessdata