Skip to content

Izel/duck-pipeline-dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About duck-pipeline-dev

Production-grade batch ETL platform built entirely on GCP, provisioned with Terraform as Infrastructure as Code (IaC). Demonstrates enterprise patterns including credential management via Secrets Manager and KMS for key rotation, end-to-end CI/CD with Cloud Build, containerised deployment on Cloud Run, and scheduled execution via Cloud Scheduler. Data is persisted to PostgreSQL via Cloud SQL.


Tools

Tool Usage
Cloud Provider GCP
Platform Building Terraform
Pipeline Construction Python

Architecture

  • Service: Google Cloud Run (HTTP-based)
  • Trigger: Cloud Scheduler (Cron)
  • Region: us-central1 (recommended)
  • Networking: Public Ingress (IAM-authenticated)


Prerequisites

  • Terraform >= 1.0.0
  • Google Cloud SDK (gcloud)

Project setup

  • Permissions (for your personal account or SA): roles/run.admin, roles/cloudscheduler.admin, roles/iam.serviceAccountUser
  • Via web console, create a new Project and attach it to a Billing Account.
  • Create a Bucket to be used as a backed bucket to store the Terraform status files.
  • Replace this bucket name in the file /terraform/envs/dev/backend.tf

Connect the Github Repository to Cloud

  1. Go to the Cloud Build Triggers page.
  2. Click Manage Repositories -> Connect Repository.
  3. Select GitHub (Cloud Build GitHub App).
  4. Follow the prompts to authorise your account and select your ducks-pipeline repository.
  5. Create the repository in the same region you have created the project (us-central1).

Google Cloud connection

  1. Choose the account you want to use for this configuration.
gcloud init
  1. Pick the cloud project to use
  2. Log in to the cloud by following the link provided after executing the command below
gcloud auth login 
  1. Select your Google account
  2. Allow Google Cloud to access your account by clicking Allow

Configure Docker

gcloud auth configure-docker <Project_region>-docker.pkg.dev

Deployment

The deployment process involves the steps below:

Note

A circular reference will be created between Cloud Run Service and Artefact Registry. The first, require an image that has not been created yet, and the second requires a Service to crete an image. To avoid this circular reference, execute step 0. Set Up only the first time or after executing terraform destroy

  1. Set Up Uncomment lines below in the terraform/modules/services/main.tf file. This will create the service using a basic, ready to use google image.
#image = "us-docker.pkg.dev/cloudrun/container/hello"  <-- Uncomment this line
image = var.image_path <-- Comment this line
  1. Cloud login and Authorization Execute the command below to login the Cloud provider
gcloud auth login
  1. Set the default project
gcloud config set project <YOUR_PROJECT_ID>
  1. Initialize Terraform:
terraform init
  1. Plan the infrastructure
terraform plan -out=tfplan
  1. Apply changes
apply -var="project_id=<YOUR_PROJECT_ID>"

Local Reproduction

To test the ETL logic locally without deploying to Cloud Run:

  1. Build the Docker image:
docker build -t ducks-etl .
  1. Run the container:
docker run -p 8080:8080 ducks-etl
  1. Trigger the process:
curl localhost:8080

Scheduled Execution

  1. The job is configured to run automatically via Cloud Scheduler.
 Target URI: https://duck-pipeline-service-dev-248136157540.us-central1.run.app
  1. Manual Trigger. To manually invoke the production pipeline with authentication:
gcloud scheduler jobs run daily-ducks-etl-job-dev --location=us-central1

Future Improvements

Possible improvements to this project:

  • Add monitoring and alerting
  • Add functionality to promote to other environments (TST, PRE, PRD) 🚧

About

GCP Data Platform ETL pipeline deployed on Cloud Run via Terraform, with CI/CD using Cloud Build and Cloud Scheduler orchestration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors