How to Deal With Databases in Docker?

1. Overview

In this article, we'll review how to work with Docker to manage databases.

In the first chapter, we'll cover the installation of a database on our local machine. Then we'll discover how data persistence is working across containers.

To conclude, we'll discuss the reliability of implementing databases in Docker production environments.

2. Running a Docker Image Locally

2.1. Starting With a Standard Docker Image

First, we have to install Docker Desktop. Then, we should find an existing image of our database from the Docker Hub. Once we find it, we'll pick the docker pull command from the top right corner of the page.

In this tutorial, we'll work with PostgreSQL, so the command is:

$docker pull postgres

When the download is complete, the docker run command will create a running database within a Docker container. For PostgreSQL, the POSTGRES_PASSWORD environment variable must be specified with the -e option:

$docker run -e POSTGRES_PASSWORD=password postgres

Next, we'll test our database container connection.

2.2. Connecting a Java Project to the Database

Let's try a simple test. We'll connect a local Java project to the database using a JDBC datasource. The connection string should use the default PostgreSQL port 5432 on localhost:

jdbc:postgresql://localhost:5432/postgres?user=postgres&password=password

An error should inform us that the port is not opened. Indeed, the database is listening for connections from inside the container network, and our Java project is running outside of it.

To fix it, we need to map the container port to our localhost port. We'll use the default port 5432 for PostgreSQL:

$docker run -p 5432:5432 -e POSTGRES_PASSWORD=password postgres

The connection is working now, and we should be able to use our JDBC data source.

2.3. Running SQL Scripts

Now, we can connect to our database from a shell, for example, to run an initialization script.

First, let's find our running container id:

$docker ps
CONTAINER ID   IMAGE      COMMAND                  CREATED          STATUS          PORTS                    NAMES
65d9163eece2   postgres   "docker-entrypoint.s…"   27 minutes ago   Up 27 minutes   0.0.0.0:5432->5432/tcp   optimistic_hellman

Then, we'll run the docker exec command with the interactive -it option to run a shell inside the container:

$docker exec -it 65d9163eece2 bash

Finally, we can connect to the database instance with the command-line client and paste our SQL script:

root@65d9163eece2:/# psql -U postgres
postgres=#CREATE DATABASE TEST;
CREATE TABLE PERSON(
  ID INTEGER PRIMARY KEY,
  FIRST_NAME VARCHAR(1000),
  LAST_NAME VARCHAR(1000)
);
...

For example, if we have a large dump file to load, we must avoid copy-pasting. We can run the import command directly from the host instead with the docker exec command:

$docker exec 65d9163eece2 psql -U postgres < dump.sql

3. Persist Data With a Docker Volume

3.1. Why Do We Need Volumes?

Our basic setup will work as long as we use the same container, with docker container stop/start each time we need to reboot. If we use docker run again, a new empty container will be created, and we'll lose our data. Indeed, Docker persists data inside a temporary directory by default.

Now, we'll learn how to modify this volume mapping.

3.2. Docker Volumes Setup

The first task is to inspect our container to see which volume is used by our database:

$docker inspect -f "{{ .Mounts }}" 65d9163eece2
[{volume f1033d3 /var/lib/docker/volumes/f1033d3/_data /var/lib/postgresql/data local true }]

We can see that the volume f1033d3 has mapped the container directory /var/lib/postgresql/data to a temporary directory /var/lib/docker/volumes/f1033d3/_data created in the host filesystem.

We have to modify this mapping by adding the -v option to the docker run command we used in chapter 2.1:

$docker run -v C:\docker-db-volume:/var/lib/postgresql/data -e POSTGRES_PASSWORD=password postgres

Now, we can see the database files created in the C:\docker-db-volume directory. We can find advanced volume configuration in this dedicated article.

As a result, each time we're using the docker run command, the data will be persisted along with the different container executions.

Also, we may want to share the configuration between team members or across different environments. We can use a Docker Compose file, which will create new containers each time. In this case, volumes are mandatory.

The following chapter will cover the specific use of a Docker database in a production environment.

4. Working With Docker in Production

Docker Compose is great for sharing configuration and managing containers as stateless services. If a service fails or can't handle the workload, we can configure Docker Compose to create new containers automatically. This is very useful for building a production cluster for REST back-ends, which are stateless by design.

However, databases are stateful, and their management is more complex: let's review the different contexts.

4.1. Single Instance Database

Let's suppose we're building a non-critical environment, for testing or for production, that tolerates periods of downtime (during deployments, backups, or failure).

In this case, we don't need a high-availability cluster, and we can simply use Docker Compose for a single-instance database:

We can use a simple volume for the data storage because the containers will be executed on the same machine
We can limit it to run one container at a time using the global mode

Let's see a minimalist working example:

version: '3'
services:       
  database:
    image: 'postgres'
    deploy:
      mode: global
    environment:
      - POSTGRES_PASSWORD=password
    ports:
      - "5432:5432"
    volumes:
      - "C:/docker-db-volume:/var/lib/postgresql/data"

Using this configuration, our production will create only one container at a time and reuse the data files from our C:\docker-db-volume directory.

However, it's even more important in this configuration to make regular backups. In case of a configuration error, this directory could be erased or corrupted by the container.

4.2. Replicated Databases

Let's assume now that our production environment is critical.

In this case, orchestration tools like Docker Swarm and Kubernetes are beneficial with stateless containers: They offer vertical and horizontal clustering, with load-balancing, fail-over, and auto-scaling capabilities.

Unfortunately, as our database containers are stateful, these solutions don't provide a volume replication mechanism.

On the other hand, it's dangerous to build homemade configurations because it can lead to severe data loss. For example:

Using shared storage like NFS or NAS for volumes can't guarantee that there will be no data loss when the database is restarted in another instance
On master-slave clusters, it's a common error to let a Docker orchestration elect more than one master node, which will lead to data corruption

So far, our different options are:

Don't use Docker for the database, and implement a database-specific or hardware replication mechanism
Don't use Docker for the database, and subscribe to Platform-as-a-Service solutions like OpenShift, Amazon AWS, or Azure
Use a Docker-specific replication mechanism like KubeDB and Portworx

5. Conclusion

In this article, we've reviewed the basic configuration that was suitable for development, testing, and non-critical production.

Finally, we concluded that Docker has drawbacks when used in high-availability environments. Therefore, it should be avoided or coupled with solutions specialized in database clusters.

The post How to Deal With Databases in Docker? first appeared on Baeldung.