Posts

Showing posts with the label OCR

2020-12-02: Comparing Four OCR Tools on US Patent Figure Label Recognition

Image
The task is to extract labels from US patent figures. Patent figures are different from natural images. They are usually drawings of an object or diagrams such as circuits. A figure file may contain one or multiple figures, each of which has a label. We need to find a software tool that can reliably identify figure labels. All the figures are in TIF format when they are downloaded from the USPTO patent repository. In the following experiments, I use OCR tools to extract figure labels using the whole figure file as the input. The candidates I compare include tesseract , Abbyy , Amazon Textract API , and Google cloud vision API . Below are figure samples and my comments. Figure #1 Figure #1 is a standard type of figure with one drawing and one label. Figure #2 Figure #2 represents figures with multiple drawings and labels. We need to extract both labels. The dot lines at the bottom of the outsole may be mistaken as words. Figure #3 Figure #3 represents more abstract drawings with numbe

2014-09-25: Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

Image
The Internet Archive (IA) and Open Library offer over 6 million fully accessible public domain eBooks. I searched for the term "dictionary" while I was casually browsing the scanned book collection to see how many dictionaries they have. I found several dictionaries in various languages. I randomly picked  A Dictionary of the English Language (1828) - Samuel Johnson, John Walker, Robert S. Jameson from the search result. I opened the dictionary in fullscreen mode  using IA's opensource online BookReader application . This book reader application has common tools for browsing an image based book such as flipping pages, seeking a page, zooming, and changing the layout. In the toolbar it has some interesting features like reading aloud and full-text searching. I wondered how could it possibly perform text searching and read aloud an scanned raster image based book? I sneaked inside the page source code which pointed me to some documentation pages. I realized it is using