2017-12-03: Introducing Docker - Application Containerization & Service Orchestration

For the last few years, Docker, the application containerization technology, has been gaining a lot of attraction from the DevOps community and lately it has made its way to the academia and research community as well. I have been following it since its inception in 2013. For the last couple years, it has become a daily driver for me. At the same time, I have been encouraging my colleagues to use Docker in their research projects. As a result, we are gradually moving away from one virtual machine (VM) per project to a swarm of nodes running containers of various projects and services. If you have accessed MemGator, CarbonDate, Memento Damage, Story Graph or some other WS-DL services lately, you have been served from our Docker deployment. We even have an on-demand PHP/MySQL application deployment system using Docker for the CS418 - Web Programming course.

I (@ibnesayeed) have been selected as the @Docker Campus Ambassador for Old Dominion University! /cc @ODU @oducs @WebSciDl
— Sawood Alam (@ibnesayeed) June 28, 2017

In the last summer, Docker Inc. selected me as the Docker Campus Ambassador for Old Dominion University. While I have already given some Docker talks to some more focused groups, with the campus ambassador hat on, I decided to organize an event where grads and undergrads of the Computer Science department at large can benefit.

Special CS colloquium this Wed (12pm, ECS 1st floor auditorium). PhD student Sawood Alam (@ibnesayeed) will be introducing us to #Docker and handing out free stuff! pic.twitter.com/iSGcxvreKo
— ODU Computer Science (@oducs) November 28, 2017

The CS department accepted it as a colloquium, scheduled for Nov 29, 2017. We were anticipating about 50 participants, but many more showed up. The increasing interest of students towards containerization technology can be taken as an indicator of its usefulness and perhaps it should be included as part of some courses offered in future.

.@ibnesayeed presenting about @Docker (#archivespark -- https://t.co/DR4bjLG7NZ) cc @WebSciDL @helgeho pic.twitter.com/MxrfUXsfPH
— Michael L. Nelson (@phonedude_mln) November 29, 2017

The session lasted for a little over an hour. It started with some slides motivating with a Dockerization story and a set of problems that potentially Docker can solve. Slides then introduced some basics of Docker and further illustrated how a simple script can be packaged into an image and distributed using DockerHub. The presentation followed by a live demo of a step-by-step evolution of a simple script into a multi-container application using micro-service architecture while demonstrating various aspects of Docker in each step. Finally, the session was opened for questions and answers.

Introducing Docker - Application Containerization & Service Orchestration by Sawood Alam

For the purpose of illustration I prepared an application that scrapes a given web page to extract links from it. The demo code has folders for various steps as it progresses from a simple script to a multi-service application stack. Each folder has a README file to explain changes from the previous step and instructions to run the application. The code is made available on GitHub. Additionally, this illustration is made available as an interactive tutorial on the official Docker training web site. Following is a brief summary of the demo.

Step 0

The Step 0 has a simple linkextractor.py Python script (as shown below) that accepts a URL as an argument and prints all the hyperlinks on the page out.

	#!/usr/bin/env python

	import sys
	import requests
	from bs4 import BeautifulSoup

	res = requests.get(sys.argv[-1])
	soup = BeautifulSoup(res.text, "html.parser")
	for link in soup.find_all("a"):
	print(link.get("href"))

view raw linkextractor.py hosted with ❤ by GitHub

However, running this rather simple script might raise some of the following issues:

Is the script executable? (chmod a+x linkextractor.py)
Is Python installed on the machine?
Can you install software on the machine?
Is "pip" installed?
Are "requests" and "beautifulsoup4" Python libraries installed?

Step 1

The Step 1 includes a simple Dockerfile to it to automate installation of all the requirements and build an isolated self-contained image.

	FROM python
	LABEL maintainer="Sawood Alam <@ibnesayeed>"

	RUN pip install beautifulsoup4
	RUN pip install requests

	WORKDIR /app
	COPY linkextractor.py /app/
	RUN chmod a+x linkextractor.py

	ENTRYPOINT ["./linkextractor.py"]

view raw Dockerfile hosted with ❤ by GitHub

Inclusion of this Dockerfile ensures that the script will run without any hiccups in a Docker container as a one-off command.

Step 2

The Step 2 makes some changes in the Python script; 1) to convert extracted paths to full URLs, 2) to extract both links and anchor texts, and 3) to move the main logic in a function and return an object so that the script can be used as a module in other scripts.

This step illustrates that new changes in the code will not affect any running containers and will not impact an image that was built already (unless overridden). Building a new image with a different tag allows co-existence of both the versions that can be run as desired.

Step 3

The Step 3 adds another Python file main.py that utilizes the module written in the previous step to expose the link extraction as a web service API that returns JSON response. Libraries required are extracted in the requirements.txt file. The Dockerfile is updated to accommodate these changes and to by default run the server rather than the script as a one-off command.

This step demonstrates how host and container ports are mapped to expose the service running inside a container.

Step 4

The Step 4 moves all the code, written so far for the JSON API, in a separate folder to build an independent image. In addition to that, it adds a PHP file index.php in a separate folder that serves as a front-end application which internally communicates with the Python API for link extraction. To glue these services together a docker-compose.yml file is added as shown below.

	version: '3'

	services:
	api:
	image: api:python
	build: ./api
	ports:
	- "5000:5000"
	web:
	image: php:7-apache
	ports:
	- "80:80"
	environment:
	- API_ENDPOINT=http://api:5000/api/
	volumes:
	- ./www:/var/www/html

view raw docker-compose.yml hosted with ❤ by GitHub

This step demonstrates how multiple services can be orchestrated using Docker Compose. We did not crate a custom image for the PHP application, instead demonstrated how the code can be mounted inside a container (in this case a container based on the official php:7-apache image). This allows any modifications of the code reflected immediately inside the running container, which could be very handy in the development mode.

Step 5

The Step 5 adds another Dockerfile to build a custom image of the front-end PHP application. The Python API server is updated to utilize Redis for caching. Additionally, the docker-compose.yml file is updated to reflect changes in the front-end application ( the"web" service block) and to include a service of Redis from its official Docker image.

This step illustrates how easy it is to progressively add components to compose a multi-container service stack. At this stage, the demo application architecture reflects what is illustrated in the title image of this post (the first figure).

Step 6

The Step 6 completely replaces the Python API service component with an equivalent Ruby implementation. Some slight modifications are made in the docker-compose.yml file to reflect these changes. Additionally, a "logs" directory is mounted in the Ruby API service as a volume for persistent storage.

This step illustrates how easily any component of a micro-service architecture application stack can be swapped out with an equivalent service. Additionally, it demonstrates volumes for persistent storage so that containers can remain stateless.

The video recording of the session is made available on YouTube as well as on the colloquium recordings page of the department (the latter has more background noise). Slides and demo codes are made available under appropriate permissive licenses to allow modification and reuse.

Resources

Slides (CC Attribution License)
Demo Source Code (MIT License)
Video Recording
Interactive Tutorial in the Play with Docker classroom

UPDATE [February 28, 2019]: Added link to the interactive play with Docker tutorial.

Sawood Alam

Search This Blog

Web Science and Digital Libraries Research Group