Posts

2020-06-05: Math formula extraction from scholarly papers using ScanSSD

Image
Add caption In order to detect mathematical expressions in PDF document images, we implemented ScanSSD. This document record each step to run ScanSSD in detail, including preparation, constructing directory structures, retraining the model, implementing the model and making visualization. Also an example of a specific project (detecting) is given in this document.  We download and use the ScanSSD model from Parag Mali’s GitHub . To retrain and test the ScanSSD model, we also download the testing data (some PDF files and utility Python programs) from here .   Installation of some Python modules   The server runs Ubuntu 18.04. ScanSSD model is implemented in Python. We have Python 3.6.9 and pip 9.0.1 installed on the Ubuntu server. The python3 command is used to start Python 3.6.9, and the pip3 command is used to run pip to install modules for Python 3.6.   1 .  Install PyTorch   The following command uses pip3 to installs PyTorch 1.5 package for Python with CUDA 10.2 on Linux, w