Skip to content

Commit 276a848

Browse files
authored
Use service provided healthcheck, with fallbacks (#870)
Two main issues with existing healthcheck setup: - Only supports `wget` to `/health` out of the box. This requires some projects to overwrite the healthcheck line, which while minor, is still a custom deviation from the template they now have to carry and potentially deal with on updates. - Requires an extra production runtime dependency (`wegt`) that projects may not need. We could add some configurable options, as we've done in other places[1], but rather than support a limited set of configurable healthcheck options, prefer a defined `healthcheck` executable that the service image provides (since it will best know what healthy is for itself), falling back through some common HTTP options if a `healthcheck` executable is not provided. And rather than (additionally) injecting `container_port` Terraform variable into the healthcheck line, we can rely on the runtime `$PORT` env var, which is part of the required service interface. Longer term this also better supports folks trimming down their image size. As it's quite likely if they are running an HTTP service, their existing stack will support making an HTTP request, which they can use to provide the healthcheck without any additional dependencies. Misc. tweaks while here: - Move `update-docker-digest` from `/template-only-bin/` to `/template-only-app/bin/` - Fix the casing of `AS` on the `FROM` line in `/template-only-app/Dockerfile` [1] navapbc/platform-test@6274c14#diff-2f885dc9fa1b819ce453d1f1e8f6a4c575cf19f7e7199135010961c69edf7340
1 parent 971a5a7 commit 276a848

12 files changed

+161
-6
lines changed

.github/workflows/template-only-ci-app.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,11 @@ jobs:
2222
- uses: actions/checkout@v4
2323
- name: Run build
2424
run: make release-build
25+
26+
healthcheck-script-tests:
27+
runs-on: ubuntu-latest
28+
29+
steps:
30+
- uses: actions/checkout@v4
31+
- name: Test healthcheck examples
32+
run: ./bin/test-healthchecks

infra/modules/service/main.tf

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,11 @@ resource "aws_ecs_task_definition" "app" {
103103
interval = 30,
104104
retries = 3,
105105
timeout = 5,
106+
# If a `healthcheck` executable is available in the container's $PATH,
107+
# use that, otherwise fall back the first available of: wget, curl, or
108+
# bash.
106109
command = ["CMD-SHELL",
107-
"wget --no-verbose --tries=1 --spider http://localhost:${var.container_port}/health || exit 1"
110+
"([ -x \"$(command -v healthcheck)\" ] && healthcheck) || ([ -x \"$(command -v wget)\" ] && wget --quiet --output-document=/dev/null http://127.0.0.1:$PORT/health) || ([ -x \"$(command -v curl)\" ] && curl --fail --silent http://localhost:$PORT/health > /dev/null) || ([ -x \"$(command -v bash)\" ] && bash -c \"exec 3<>/dev/tcp/127.0.0.1/$PORT;echo -e 'GET /health HTTP/1.1\\r\\nHost: http://localhost\\r\\nConnection: close\\r\\n\\r\\n' >&3;grep -q '^HTTP/.* 200 OK' <&3\") || exit 1"
108111
]
109112
},
110113
environment = local.environment_variables,

template-only-app/Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Run `make update-docker-digest` to update the image
2-
FROM python:3-alpine@sha256:657dbdb20479a6523b46c06114c8fec7db448232f956a429d3cc0606d30c1b59 as release
2+
FROM python:3-alpine@sha256:657dbdb20479a6523b46c06114c8fec7db448232f956a429d3cc0606d30c1b59 AS release
33

44
RUN adduser --system --disabled-password --no-create-home app
55

@@ -15,6 +15,7 @@ COPY requirements.txt ./
1515
RUN pip3 install --no-cache-dir -r requirements.txt
1616

1717
COPY db-migrate /usr/bin/
18+
COPY bin/healthcheck-netcat /usr/bin/healthcheck
1819
COPY migrations.sql /app/
1920
COPY *.py /app/
2021
COPY /templates /app/templates
@@ -26,5 +27,7 @@ ENV HOST=0.0.0.0
2627
# Run as non-root user
2728
USER app
2829

30+
HEALTHCHECK CMD /usr/bin/healthcheck
31+
2932
# Create a basic webserver and run it until the container is stopped
3033
CMD ["python", "-m", "app"]

template-only-app/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ release-build:
99
.
1010

1111
update-docker-digest:
12-
../template-only-bin/update-docker-digest Dockerfile
12+
./bin/update-docker-digest Dockerfile
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/usr/bin/env bash
2+
# https://tldp.org/LDP/abs/html/devref1.html
3+
4+
# open connection
5+
exec 3<>/dev/tcp/127.0.0.1/${PORT}
6+
7+
# send request
8+
echo -e "GET /health HTTP/1.1\r\nHost: http://localhost\r\nConnection: close\r\n\r\n" >&3
9+
10+
# read response
11+
#
12+
# one could use grep, like:
13+
#
14+
# grep "HTTP/1.1 200 OK" <&3
15+
#
16+
# but we'll rely purely on bash builtins
17+
18+
# read the first line
19+
read -u 3 status_line
20+
21+
# be a good citizen, close the file descriptor as soon as we are done with it
22+
exec 3<&-
23+
24+
# check for a 200 OK status response
25+
[[ "$status_line" =~ ^HTTP/.*\ 200\ OK ]]
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/usr/bin/env sh
2+
# https://curl.se/docs/manpage.html
3+
4+
curl --fail --silent http://localhost:"${PORT}"/health > /dev/null
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/usr/bin/env sh
2+
# https://busybox.net/downloads/BusyBox.html#nc
3+
# https://man.openbsd.org/nc.1
4+
# https://manpages.debian.org/unstable/netcat-traditional/nc.traditional.1.en.html
5+
6+
printf 'GET /health HTTP/1.1\r\nHost: http://localhost\r\nConnection: close\r\n\r\n' | nc 127.0.0.1 "${PORT}" | grep -q '^HTTP/.* 200 OK'
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/usr/bin/env sh
2+
# https://busybox.net/downloads/BusyBox.html#wget
3+
4+
# --spider causes a HEAD request instead of GET, which while not that different
5+
# and in most frameworks where you implement a GET handler it will automatically
6+
# handle HEAD requests as well, some checkers (like AWS ALBs) specifically send
7+
# GET requests, so we should match.
8+
#
9+
# So use --output-document instead, throwing away the response.
10+
wget --quiet --output-document=/dev/null http://127.0.0.1:"${PORT}"/health
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/usr/bin/env sh
2+
# https://www.gnu.org/software/wget/manual/wget.html
3+
4+
# --spider causes a HEAD request instead of GET, which while not that different
5+
# and in most frameworks where you implement a GET handler it will automatically
6+
# handle HEAD requests as well, some checkers (like AWS ALBs) specifically send
7+
# GET requests, so we should match.
8+
#
9+
# So use --output-document instead, throwing away the response.
10+
wget --tries=1 --quiet --output-document=/dev/null http://127.0.0.1:"${PORT}"/health
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Check that the healtcheck-* examples have consistent basic behavior
4+
5+
set -euo pipefail
6+
7+
SCRIPT_DIR=$(dirname "$0")
8+
9+
PORT=3000
10+
export PORT
11+
12+
trap_background_jobs() {
13+
trap 'trap - SIGTERM && kill $(jobs -p)' SIGINT SIGTERM EXIT
14+
}
15+
16+
start_server_ok_response() {
17+
SERVER_DIR=$(mktemp -d)
18+
19+
pushd "${SERVER_DIR}" > /dev/null
20+
# create the file so the python server will return a 200 response for
21+
# requests to /health
22+
touch health
23+
python -m http.server ${PORT} &> /dev/null &
24+
popd > /dev/null
25+
26+
# Give the server time to start
27+
sleep 1
28+
}
29+
30+
start_server_fail_response() {
31+
pushd "${SERVER_DIR}" > /dev/null
32+
# trigger a non-200 response for requests to /health
33+
rm -f health
34+
popd > /dev/null
35+
}
36+
37+
run_healthchecks() {
38+
local run_after_each_healthcheck=$1
39+
local exit_code
40+
41+
for healthcheck in "${SCRIPT_DIR}"/healthcheck-*; do
42+
echo "${healthcheck}"
43+
"${healthcheck}" && exit_code=$? || exit_code=$? && :
44+
echo "Exit code: ${exit_code}"
45+
${run_after_each_healthcheck} "${exit_code}"
46+
echo ""
47+
done
48+
}
49+
50+
# shellcheck disable=SC2317
51+
fail_if_non_zero() {
52+
local exit_code=$1
53+
[ "${exit_code}" == "0" ] || { FAIL_TEST=true && echo "Failed"; }
54+
}
55+
56+
# shellcheck disable=SC2317
57+
fail_if_zero() {
58+
local exit_code=$1
59+
[ "${exit_code}" != "0" ] || { FAIL_TEST=true && echo "Failed"; }
60+
}
61+
62+
# Start tests
63+
64+
trap_background_jobs
65+
66+
FAIL_TEST=false
67+
68+
# Healthy response
69+
start_server_ok_response
70+
71+
echo "::group::Test handling of healthy responses"
72+
run_healthchecks fail_if_non_zero
73+
echo "::endgroup::"
74+
75+
# Unhealthy response
76+
start_server_fail_response
77+
78+
echo "::group::Test handling of unhealthy responses"
79+
run_healthchecks fail_if_zero
80+
echo "::endgroup::"
81+
82+
if [ "${FAIL_TEST}" == "true" ]; then
83+
exit 1
84+
else
85+
exit 0
86+
fi

template-only-docs/application-requirements.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ In order to use the template infrastructure, you need an application that meets
55
* The application's source code lives in a folder that lives in the project root folder e.g. `/app`.
66
* The application folder needs to have a `Makefile` that has a build target `release-build` that takes in a Makefile variable `OPTS`, and passes those `OPTS` as options to the `docker build` command. The top level [Makefile](/Makefile) in this repo will call the application's `release-build` make target passing in release tags to tag the docker image with.
77
* The web application needs to listen on the port defined by the environment variable `PORT`, rather than hardcode the `PORT`. This allows the infrastructure to configure the application to listen on a container port specified by the infrastructure. See [The Twelve-Factor App](https://12factor.net/) to learn more about designing applications to be portable to different infrastructure environments using environment variables.
8-
* The web application needs to have a health check endpoint at `/health` that returns an HTTP 200 OK response when the application is healthy and ready to accept requests.
9-
* The Docker image needs to have `wget` installed. This is used in the container task definition's healthcheck configuration in order to ping the application's `/health` endpoint. If you want to use a different healthcheck command (e.g. `curl`) then you'll need to modify the `healthCheck` configuration in the `aws_ecs_task_definition` resource in [modules/service/main.tf](/infra/modules/service/main.tf).
8+
* The web application needs to have a health check endpoint `GET /health` that returns an HTTP 200 OK response when the application is healthy and ready to accept requests.
9+
* Provide an executable named `healthcheck` in container's `$PATH` which exits with code `0` if your service is healthy, and non-zero if not healthy (some examples in `/template-only-app/bin/`). Or have `wget`, `curl`, or `bash+grep` available, in which case the application's `/health` endpoint will be pinged for container healthchecks.
1010

1111
## Database Requirements
1212

1313
If your application needs a database, it must also:
1414

15-
* Have a `db-migrate` command available in the container's PATH for running migrations. If you use a migration framework like [Alembic](https://alembic.sqlalchemy.org/) or [Flyway](https://flywaydb.org/) you can create a `db-migrate` script that then calls your framework's binary.
15+
* Have a `db-migrate` command available in the container's `$PATH` for running migrations. If you use a migration framework like [Alembic](https://alembic.sqlalchemy.org/) or [Flyway](https://flywaydb.org/) you can create a `db-migrate` script that then calls your framework's binary.
1616
* Both the application service container and the container running the `db-migrate` script will receive the following environment variables that are needed to [connect to the database using IAM authentication](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.html):
1717
* `DB_HOST` - the hostname to connect to
1818
* `DB_PORT` - the port that the database is listening on

0 commit comments

Comments
 (0)