This repo hosts a Docker Compose implementation for Apache Airflow.
The docker-compose.yaml is the file that specifies what images are required, what ports they need to expose, whether they have access to the host filesystem, what commands should be run when they start up, and so on.
An .env file must be created in the project folder. The file .env.git can be taken as an example. The file must contain the following environment variables:
| key | value | description |
|---|---|---|
| AIRFLOW_IMAGE_NAME | string | Image version used for Apache Airflow |
| AIRFLOW_UID | number | Airflow user identifier |
| AIRFLOW_PROJ_DIR | string | Absolute path to this Apache Airflow implementation |
The airflow_cwl_utils.py file is a utility module shared across all DAGs in the Apache Airflow setup. It acts as the bridge between Airflow and CWL (Common Workflow Language). It has two responsibilities:
- resolve_inputs(): Reads a step's input YAML file and resolves references before execution.
- create_bash_command(): Builds the shell command string that Airflow's BashOperator will execute for each step.
The cwl_run.sh is the shell script that actually executes a single CWL workflow step. It's called by every BashOperator task via create_bash_command() in airflow_cwl_utils.py.
Why docker_wrapper.sh? Because cwltool normally calls docker run directly, but the apache worker is itself a container. The wrapper remaps paths from the container's /opt/airflow/... namespace to the host's real paths, so Docker-in-Docker mounts work correctly.
The x-airflow-common is a YAML anchor — a reusable configuration block that avoids repeating the same settings across every Airflow service. In this implementation it has been customized for installing cwltool, needed for executing the BioExcel Building Blocks on Apache Airflow.
The CWL Airflow Dockerfile installs cwltool in each of the Apache Airflow Services automatically.
First off, go to the project root folder.
For building the services via Docker Compose, please execute the following instruction:
docker compose buildDeploy services:
docker compose up -dLists the containers:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<ID> docker-airflow-worker "/usr/bin/dumb-init …" 16 hours ago Up 16 hours (healthy) 8080/tcp <NAME>
<ID> docker-airflow-apiserver "/usr/bin/dumb-init …" 16 hours ago Up 16 hours (healthy) 0.0.0.0:8080->8080/tcp <NAME>
<ID> docker-airflow-triggerer "/usr/bin/dumb-init …" 16 hours ago Up 16 hours (healthy) 8080/tcp <NAME>
<ID> docker-airflow-scheduler "/usr/bin/dumb-init …" 16 hours ago Up 16 hours (healthy) 8080/tcp <NAME>
<ID> docker-airflow-dag-processor "/usr/bin/dumb-init …" 16 hours ago Up 16 hours (healthy) 8080/tcp <NAME>
<ID> docker-airflow-init "/bin/bash -c 'if [[…" 16 hours ago Exited (0) 16 hours ago <NAME>
<ID> postgres:16 "docker-entrypoint.s…" 16 hours ago Up 16 hours (healthy) 5432/tcp <NAME>
<ID> nginx:alpine "/docker-entrypoint.…" 16 hours ago Up 16 hours 0.0.0.0:8888->80/tcp <NAME>
<ID> redis:7.2-bookworm "docker-entrypoint.s…" 16 hours ago Up 16 hours (healthy) 6379/tcp <NAME>Open a browser and type:
http://localhost:8080/
Apache Airflow doesn't serve output files because it was designed as a workflow orchestrator, not a data platform. Its core philosophy is:
"Airflow schedules and monitors tasks. What those tasks do with data is not Airflow's concern."
So, in this implementation, an Nginx server for accessing the outputs via web is provided.
Once a workflow has run, open a workflow and type:
http://localhost:8888/<WF NAME>/outputs/<STEP>/<FILE NAME>
Shutdown all the Airflow services:
docker compose down
Clear DAG from Database (sometimes stale DAGs are cahced and must be removed, ie if changing the DAG name):
docker compose exec airflow-apiserver airflow dags delete <dag_name_to_remove> -y
This software has been developed in the MMB group at the IRB for the European BioExcel, funded by the European Commission (EU Horizon Europe 101093290, EU H2020 823830, EU H2020 675728).
- (c) 2015-2026 Institute for Research in Biomedicine
Licensed under the Apache License 2.0, see the file LICENSE for details.


