postgres


PostgreSQL Backup and Restore Between Docker-composed Containers

The importance of backup and recovery really only becomes clear in the face of catastrophic data loss. I’ve got a slick little Padrino app that’s starting to generate traffic (and ad revenue). As such, it would be a real shame if my data got lost and I had to start from scratch.

docker-compose

This is what I’m working with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# docker-compose.yml
nginx:
restart: always
build: ./
volumes:
# Page content
- ./:/home/app/webapp
links:
- postgres
environment:
- PASSENGER_APP_ENV=production
- RACK_ENV=production
- VIRTUAL_HOST=example.com
- LETSENCRYPT_HOST=example.com
- LETSENCRYPT_EMAIL=daniel@example.com
postgres:
restart: always
image: postgres
environment:
- POSTGRES_USER=root
- POSTGRES_PASSWORD=secretpassword
volumes_from:
- myapp_data

It’s the old Compose Version 1 syntax, but what follows should still apply. As with all such compositions, I write database data to a data-only container. Though the data persists apart from the Dockerized Postgres container, it still needs to be running (e.g., docker-compose up -d).

Dump the data

Assuming the containers are up and running, the appropriate command looks like this:

1
docker-compose exec -u <your_postgres_user> <postgres_service_name> pg_dump -Fc <database_name_here> > db.dump

Given the composition above, the command I actually execute is this:

1
docker-compose exec --user root postgres pg_dump -Fc myapp_production > db.dump

At this point, the db.dump file can be transfered to a remote server through whatever means are appropriate (I set this all up in capistrano to make it super easy).

Restore the data

Another assumption: a new database is up and running on the remote backup machine (ideally using the same docker-compose.yml file above).

The restore command looks like this:

1
docker-compose exec -i -u <your_postgres_user> <postgres_service_name> pg_restore -C -d postgres < db.dump

The command I execute is this:

1
docker-compose exec -i -u root postgres pg_restore -C -d postgres < db.dump

Done!


Dockerized Etherpad-Lite with PostgreSQL

There is surprisingly little information out there on how to deploy etherpad-lite with postgres. There is,
on the other hand, quite a bit of information on how to deploy etherpad with docker. At the time of writing,
there is nothing on how to do it with docker-compose specifically. I rarely use docker apart from
docker-compose, and will go to great lengths to ensure I can dockerize my composition, be it for
etherpad or any other docker-appropriate application.

The following comprises my collection of notes on what it took to build an etherpad docker image and link
it to a postgres container with docker-compose. Much of it was inspired (i.e., shamelessly plagiarized) from
the fine work done by tvelocity on GitHub.

The following assumes that the required software (e.g., docker, docker-compose, etc.) is installed on an
Ubuntu 16.04 machine. It’s meant to be simple enough that the stated goal can be accomplished by simply
copying and pasting content into the various required files and executing the commands specified.

Setup

Create a directory in which to organize your docker composition.

1
mkdir my-etherpad && cd my-etherpad

Dockerfile

Using your favourite text editor (mine’s vim), copy and paste the following into your Dockerfile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
FROM node:0.12
MAINTAINER Some Guy, someguy@example.com

# For postgres
RUN apt-get update
RUN apt-get install -y libpq-dev postgresql-client

# Clone the latest etherpad version
RUN cd /opt && git clone https://github.com/ether/etherpad-lite.git etherpad

WORKDIR /opt/etherpad

RUN bin/installDeps.sh && rm settings.json

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

EXPOSE 9001

ENTRYPOINT ["/entrypoint.sh"]
CMD ["bin/run.sh", "--root"]

The node:0.12 image upon which this image is built was chosen purposefully. At the moment, etherpad-lite
does not run on node versions 6.0 and 6.1, which is
what you get if you build off the latest node image.

Also note the ENTRYPOINT ["/entrypoint.sh"] line. This implies we’ll need this entrypoint.sh file to
run every time we fire up an image.

entrypoint.sh

Create a file called entrypoint.sh and paste the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/bin/bash
set -e

if [ -z "$POSTGRES_PORT_5432_TCP_ADDR" ]; then
echo >&2 'error: missing POSTGRES_PORT_5432_TCP environment variable'
echo >&2 ' Did you forget to --link some_postgres_container:postgres ?'
exit 1
fi

# If we're linked to PostgreSQL, and we're using the root user, and our linked
# container has a default "root" password set up and passed through... :)
: ${ETHERPAD_DB_USER:=root}
if [ "$ETHERPAD_DB_USER" = 'root' ]; then
: ${ETHERPAD_DB_PASSWORD:=$POSTGRES_ENV_POSTGRES_ROOT_PASSWORD}
fi
: ${ETHERPAD_DB_NAME:=etherpad}

ETHERPAD_DB_NAME=$( echo $ETHERPAD_DB_NAME | sed 's/\./_/g' )

if [ -z "$ETHERPAD_DB_PASSWORD" ]; then
echo >&2 'error: missing required ETHERPAD_DB_PASSWORD environment variable'
echo >&2 ' Did you forget to -e ETHERPAD_DB_PASSWORD=... ?'
echo >&2
echo >&2 ' (Also of interest might be ETHERPAD_DB_USER and ETHERPAD_DB_NAME.)'
exit 1
fi

: ${ETHERPAD_TITLE:=Etherpad}
: ${ETHERPAD_PORT:=9001}
: ${ETHERPAD_SESSION_KEY:=$(
node -p "require('crypto').randomBytes(32).toString('hex')")}

# Check if database already exists
RESULT=`PGPASSWORD=${ETHERPAD_DB_PASSWORD} psql -U ${ETHERPAD_DB_USER} -h postgres \
-c "\l ${ETHERPAD_DB_NAME}"`

if [[ "$RESULT" != *"$ETHERPAD_DB_NAME"* ]]; then
# postgres database does not exist, create it
echo "Creating database ${ETHERPAD_DB_NAME}"

PGPASSWORD=${ETHERPAD_DB_PASSWORD} psql -U${ETHERPAD_DB_USER} -h postgres \
-c "create database ${ETHERPAD_DB_NAME}"
fi

OWNER=`PGPASSWORD=${ETHERPAD_DB_PASSWORD} psql -U ${ETHERPAD_DB_USER} -h postgres \
-c "SELECT u.usename
FROM pg_database d
JOIN pg_user u ON (d.datdba = u.usesysid)
WHERE d.datname = (SELECT current_database());"`

if [[ "$OWNER" != *"$ETHERPAD_DB_USER"* ]]; then
# postgres database does not exist, create it
echo "Setting database owner to ${ETHERPAD_DB_USER}"
PGPASSWORD=${ETHERPAD_DB_PASSWORD} psql -U ${ETHERPAD_DB_USER} -h postgres \
-c "alter database ${ETHERPAD_DB_NAME} owner to ${ETHERPAD_DB_USER}"
fi

if [ ! -f settings.json ]; then

cat <<- EOF > settings.json
{
"title": "${ETHERPAD_TITLE}",
"ip": "0.0.0.0",
"port" :${ETHERPAD_PORT},
"dbType" : "postgres",
"dbSettings" : {
"user" : "${ETHERPAD_DB_USER}",
"host" : "postgres",
"password": "${ETHERPAD_DB_PASSWORD}",
"database": "${ETHERPAD_DB_NAME}"
},
EOF

if [ $ETHERPAD_ADMIN_PASSWORD ]; then
: ${ETHERPAD_ADMIN_USER:=admin}
cat <<- EOF >> settings.json
"users": {
"${ETHERPAD_ADMIN_USER}": {
"password": "${ETHERPAD_ADMIN_PASSWORD}",
"is_admin": true
}
},
EOF
fi

cat <<- EOF >> settings.json
}
EOF
fi

exec "$@"

This does a whole bunch of stuff. Most importantly, it creates the settings.json file which
configures the whole etherpad-lite application. The settings produced above connect the application
to the postgres database and setup an adminstrative user. There’s a whole bunch of stuff that
can be done to make this more configurable (e.g., setup SSL certs, et al). I’m all ears.

docker-compose.yml

You may have noticed a bunch of environment variables in the previous entrypoints.sh file.
Those get set in docker-compose.yml. Create that file and paste the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
etherpad:
restart: always
build: ./
ports:
- "9001:9001"
links:
- postgres
environment:
- ETHERPAD_DB_USER=etherpad
- ETHERPAD_DB_PASSWORD=foobarbaz
- ETHERPAD_DB_NAME=store
- ETHERPAD_ADMIN_PASSWORD=foobarbaz
postgres:
restart: always
image: postgres
environment:
- POSTGRES_USER=etherpad
- POSTGRES_PASSWORD=foobarbaz
volumes_from:
- etherpad_data

This sets everything up so that the etherpad Dockerfile gets built and linked to a
postgres container… well, everything except for one bit: it doesn’t create
the required etherpad_data data-only container.

Data-only container

Create the data-only container from the command line so that your data won’t be erased if your
other containers get smoked by accident or on purpose.

1
docker create --name etherpad_data -v /dbdata postgres /bin/true

Finally, fire ‘er all up!

1
docker-compose up

If you did everything correctly, your etherpad-lite application will be accessible on http://localhost:9001.


Populating Dockerized PostgreSQL with dumped data from host

I needed to populate a local SQLite database to do some development testing for a client. Her web app has a lot of moving parts and left something to be desired in terms of testing. Most of these steps were adapted from this wonderful post.

Long story short: I needed to recreate an issue discovered in production, but didn’t have any data with which to write tests or even recreate the issue manually. The whole thing is a huge pain in the butt, so I dumped the production database from Heroku and attempted a brute-force reconstruction of the issue. The Heroku instructions produced a file called latest.dump.

Create/run the container

I like using Docker for this kind of stuff. I don’t even run any databases on my host machine anymore.

From the Docker PostgreSQL documentation…

1
docker run --name some-postgres -e POSTGRES_PASSWORD=secret -p 5432:5432 -d postgres

Restore to local database

This is where the dumped data gets written to the Dockerized PostgreSQL database.

1
pg_restore --verbose --clean --no-acl --no-owner -h localhost -U postgres --create --dbname=postgres latest.dump

Login

There are a couple of ways to find the database name. Either log in as follows, or go to the Heroku database dashboard. The password was set to secret when the container was created. It’s good idea to login anyway, if only to make sure the database is actually there.

1
psql -h localhost -p 5432 -U postgres --password

Dump the local database

I didn’t want to mess with the previous developer’s setup, so I took the wussy way out and used the existing SQLite configuration to run the app locally. This requires creating an SQLite DB with the PostgreSQL data dump. This command dumps that data:

1
pg_dump -h localhost -p 5432 -U postgres --password --data-only --inserts YOUR_DB_NAME > dump.sql

Create SQLite database

Again, these steps were adapted from this post.

Modify dump.sql

My favourite editor is vim. I used it to modify dump.sql.

Remove all the SET statements at the top of the file.

It’ll look something like this:

1
2
3
4
5
6
7
8
9
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;

SET search_path = public, pg_catalog;

Get rid of it all.

Remove all the setval queries

The post that I’m plagiarizing says these autoincrement IDs. SQLite doesn’t need this. Find and remove everything that looks like this (i.e., everything with a call to setval):

1
SELECT pg_catalog.setval('friendly_id_slugs_id_seq', 413, true);

Replace true with 't' and false with 'f'

Anything that looks like this:

1
2
INSERT INTO article_categories VALUES (4, 9, 13, true, '2011-11-22 06:29:07.966875', '2011-11-22 06:29:07.966875');
INSERT INTO article_categories VALUES (26, 14, NULL, false, '2011-12-07 09:09:52.794238', '2011-12-07 09:09:52.794238');

Needs to look like this:

1
2
INSERT INTO article_categories VALUES (4, 9, 13, 't', '2011-11-22 06:29:07.966875', '2011-11-22 06:29:07.966875');
INSERT INTO article_categories VALUES (26, 14, NULL, 'f', '2011-12-07 09:09:52.794238', '2011-12-07 09:09:52.794238');

Wrap it all in a single transaction

If you don’t do this, the import will take forever. At the top of the file put

1
BEGIN;

and at the very bottom of the file put

1
END;

Create the SQLite database

Finally, after all that hard work:

1
sqlite3 development.sqlite < dump.sql

The app was already set up to work with the development.sqlite database. There were a couple of error messages when creating it, but they didn’t seem to impact the app. Everything worked fine, and I found a nifty new way to use Docker.