Sunday, March 4, 2018

2018-03-04: Installing Stanford CoreNLP in a Docker Container

Fig. 1: Example of Text Labeled with the CoreNLP Part-of-Speech, Named-Entity Recognizer and Dependency Annotators.
The Stanford CoreNLP suite provides a wide range of important natural language processing applications such as Part-of-Speech (POS) Tagging and Named-Entity Recognition (NER) Tagging. CoreNLP is written in Java and there is support for other languages. I tested a couple of the latest Python wrappers that provide access to CoreNLP but was unable to get them working due to different environment-related complications. Fortunately, with the help of Sawood Alam, our very able Docker campus ambassador at Old Dominion University, I was able to create a Dockerfile that installs and runs the CoreNLP server (version 3.8.0) in a container. This eliminated the headaches of installing the server and also provided a simple method of accessing CoreNLP services through HTTP requests.
How to run the CoreNLP server on localhost port 9000 from a Docker container
  1. Install Docker if not already available
  2. Pull the image from the repository and run the container:
Using the server
The server can be used either from the browser or the command line or custom scripts:
  1. Browser: To use the CoreNLP server from the browser, open your browser and visit http://localhost:9000/. This presents the user interface (Fig. 1) of the CoreNLP server.
  2. Command line (NER example):
    Fig. 2: Sample request URL sent to the Named Entity Annotator 
    To use the CoreNLP server from the terminal, learn how to send requests to the particular annotator from the CoreNLP usage webpage or learn from the request URL the browser (1.) sends to the server. For example, this request URL was sent to the server by from the browser (Fig. 2), and corresponds to following command that uses the Named-Entity Recognition system to label the supplied text:
  3. Custom script (NER example): I created a Python function nlpGetEntities() that uses the NER annotator to label a user-supplied text.
To stop the server, issue the following command: 
The Dockerfile I created targets CoreNLP version 3.8.0 (2017-06-09). There is a newer version of the service (3.9.1). I believe it should be easy to adapt the Dockerfile to install the latest version by replacing all occurrences of "2017-06-09" with "2018-02-27" in the Dockerfile.  However, I have not tested this operation since version 3.9.1 is marginally different from version 3.8.0 for my use case, and I have not tested version 3.9.1 with my application benchmark. 


