2020-06-05: Math formula extraction from scholarly papers using ScanSSD
Add caption In order to detect mathematical expressions in PDF document images, we implemented ScanSSD. This document record each step to run ScanSSD in detail, including preparation, constructing directory structures, retraining the model, implementing the model and making visualization. Also an example of a specific project (detecting) is given in this document. We download and use the ScanSSD model from Parag Mali’s GitHub . To retrain and test the ScanSSD model, we also download the testing data (some PDF files and utility Python programs) from here . Installation of some Python modules The server runs Ubuntu 18.04. ScanSSD model is implemented in Python. We have Python 3.6.9 and pip 9.0.1 installed on the Ubuntu server. The python3 command is used to start Python 3.6.9, and the pip3 command is used to run pip to install modules for Python 3.6. 1 . Install PyTorch The following command uses pip3 to installs PyTorch 1.5 package for Python with CUDA 10.2 on Linux, w