- 🔍 Description
- 🚀 Project Overview
- 📁
modern-data-pipeline-gcp
– Project Root - 🛠️ Tech Stack
- 🚀 Quick Start
- 🐳 Using Docker Compose with Airflow
- 🧪 Testing
- 📡 Monitoring & Logging
- 🚀 CI/CD
- 🗺️ Roadmap
- 🤝 Contributing Guidelines
- 🧰 Resources
- 👩💻 Author
A production-grade, modular ETL workflow built with Apache Airflow, DBT, and Google Cloud Platform (GCP) services.
Designed for orchestrated extraction, transformation, and loading, with integrated data quality, monitoring, and CI/CD best practices.
This repository contains an end-to-end ETL workflow designed for scalability, maintainability, and cloud readiness.
It showcases how to orchestrate data pipelines using Airflow, enrich and transform data with DBT, and deploy the solution using containerized environments and CI/CD pipelines.
This repository delivers an end-to-end data pipeline with:
- Extraction from a PostgreSQL source and external exchange rate API
- Enrichment and Transformation using DBT models
- Loading to CSV, Google Sheets, and optionally to BigQuery
- Orchestration via Airflow DAGs — modular and containerized
- Built-in data quality checks, logging, and alerts
- Modular structure:
extract
transform
load
validate
notify
- Docker + Docker Compose for consistent local execution
- DBT for SQL-based modeling and schema tests
- Airflow DAG to sequence the pipeline steps
- Integration with GCP services:
- BigQuery
- Sheets API
- Secret Manager
- Robust logging and optional Stackdriver integration
.
├── .venv # Virtual environment
├── transformations # Data transformation logic
│ ├── dags
│ │ ├── dag_etl.py
│ │ ├── dag_rates.py
│ │ └── dag_reports.py
│ ├── data # CSV exports
│ │ ├── mock_order_items.csv
│ │ ├── mock_orders.csv
│ │ ├── mock_products.csv
│ │ ├── mock_rates.csv
│ │ └── mock_users.csv
│ ├── dbt
│ │ └── .dbt
│ │ ├── .user.yml
│ │ └── profiles.yml
│ ├── models # DBT models
│ │ └── mock
│ │ ├── mock_order_items.sql
│ │ ├── mock_orders.sql
│ │ ├── mock_products.sql
│ │ ├── mock_users.sql
│ │ └── schema.yml
│ ├── reports
│ │ ├── active_clients_without_sales.csv
│ │ ├── order_by_status.csv
│ │ └── sales_by_clients.csv
│ ├── scripts # Python scripts for pipeline steps
│ │ ├── export_csv.py
│ │ ├── export_sheets.py
│ │ ├── load_exchange_rates.py
│ │ ├── push_to_bigquery.py
│ │ ├── run.sh
│ │ └── upload_tables.py
│ ├── utils # Custom utility functions
| | └── quality_checks.py
│ ├── airflow
│ ├── dbt_project.yml # DBT project config
│ ├── requirement-dev.txt # Python dependencies
│ ├── requirements.txt # Python dependencies
│ ├── run_pipeline.py
│ └── wait-for-postgres.sh
├── .gitignore
├── LICENSE
└── README.md # Project documentation
Layer | Technologies |
---|---|
Orchestration | Airflow on Docker (locally) or Cloud Composer |
Transformation | DBT models deployed to BigQuery |
Source data | PostgreSQL |
Destinations | CSV, Google Sheets, BigQuery |
Cloud infra | GCP: BigQuery, Sheets API, Secret Manager |
Containerization | Docker & Docker Compose |
Language | Python |
CI/CD | GitHub Actions |
- Docker & Docker Compose
- Python 3.9+ (for local development)
- GCP project with access to BigQuery, Sheets API, and Secret Manager
- PostgreSQL instance for source data
make
(optional, if using Makefile shortcuts)
git clone https://github.com/CamilaJaviera91/modern-data-pipeline-gcp.git
cd modern-data-pipeline-gcp
Copy .env.example
to .env
and supply:
# ⚙️ Airflow
AIRFLOW_UID=...
AIRFLOW__CORE__EXECUTOR=...
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=...
AIRFLOW__CORE__DAGS_FOLDER=...
AIRFLOW__LOGGING__BASE_LOG_FOLDER=...
AIRFLOW__WEBSERVER__SECRET_KEY=...
# 🐘 DBT / Database (PostgreSQL)
DBT_HOST=...
DBT_HOST_TEST=...
DBT_USER=...
DBT_PASSWORD=...
DBT_DBNAME=...
DBT_SCHEMA=...
DBT_PORT=...
# 💱 Exchange Rates API
EXCHANGE_API_KEY=...
# ☁️ Google Cloud / BigQuery
GOOGLE_CREDENTIALS_PATH=...
BQ_PROJECT_ID=...
BQ_DATASET=...
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
sudo systemctl start docker
sudo systemctl status docker #verify if it's running
docker-compose up --build
cd transformations
dbt init
For a daily production run:
./run.sh run
Mock mode only:
./run.sh run --select enrich_exchange_rates
This guide shows the basic commands to start and manage Airflow using Docker Compose.
docker compose down -v --remove-orphans
-
Stops and removes containers, networks, and volumes.
-
Use this to reset your environment completely.
docker compose run airflow-init
-
Runs a one-time container to set up Airflow’s database and config.
-
Run this once before starting Airflow.
docker compose build
docker compose up -d
-
Starts all services defined in
docker-compose.yml
. -
Runs containers in detached mode (background).
Run these commands in order to start fresh:
docker compose down -v --remove-orphans
docker compose run airflow-init #just once
docker compose build
docker compose up -d --remove-orphans
-
Make sure Docker and Docker Compose are installed.
-
To see logs, use:
docker compose logs -f
- DBT tests for schema, uniqueness, and relationships
- Airflow DAG validation:
airflow dags list
,airflow dags test
- Unit tests for custom Python functions in
/scripts
or/dags
- CI pipeline (planned): Linting, formatting, DAG validation, DBT compile
Run tests to validate your pipeline:
./run.sh test
# or
dbt test --select mock
- Airflow task logs viewable via the web UI
- Custom loggers for API responses and ETL steps
- Optionally integrates with Stackdriver Logging and Alerting
- GitHub Actions triggers include:
- Linting + formatting checks
- DBT compilation and tests
- Docker image builds
- Deployment to Cloud Composer (planned)
- ✅ Initialize core modular pipeline
- 🧪 Add unit tests for Python & DBT logic
- 🔁 Implement full CI/CD with automated deploy to GCP
- 🔄 Extend support to additional sinks (Snowflake, S3, etc.)
- ⏰ Enable scheduling on Cloud Composer
Thank you for your interest in contributing to this project!
- Fork the repository.
- Clone your fork:
git clone https://github.com/<your-username>/modern-data-pipeline-gcp.git
- Create a new branch:
git checkout -b feature/your-feature-name
- Make your changes and commit:
git commit -m "Add new feature"
- Push to your fork:
git push origin feature/your-feature-name
- Submit a pull request to the
main
branch.
Camila Javiera Muñoz Navarro
Data Engineer & Analyst | BigQuery | Airflow | Python | GCP
GitHub | LinkedIn | Portfolio
⭐ If you find this project useful, give it a ⭐️ and share your feedback or ideas in Issues!
This project is licensed under the MIT License.