Boat Data - ETL pipeline

A simple, production-grade ETL pipeline for cleaning and validating raw boat sales data. The test data is from Kaggle. This script:

Cleans UTF-8 encoded CSV files with non-ASCII characters.
Converts currency and year fields into structured formats.
Validates schema using Pandera.
Jupyter Notebook contains an exploratory analysis of the same data with some graphs

Usage

You can run the ETL pipeline as follows.

  boat-etl \
  -i data/boat_data.csv \
  -o output/validated_boat_data.csv

Docker Usage

You can run the ETL pipeline inside a Docker container for reproducibility and ease of deployment.

1. Build the Docker image

docker build -t boat-etl .

2. Run the ETL pipeline

Run the pipeline using default parameters (as set in CMD of Dockerfile):

docker run --rm boat-etl

Or specify input and output paths:

docker run --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/output:/app/output \
  boat-etl \
  -i data/boat_data.csv \
  -o output/validated_boat_data.csv

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
data		data
notebooks		notebooks
src		src
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boat Data - ETL pipeline

Usage

Docker Usage

1. Build the Docker image

2. Run the ETL pipeline

About

Uh oh!

Releases 1

Packages

Languages

License

ndaniel/boat-data-etl

Folders and files

Latest commit

History

Repository files navigation

Boat Data - ETL pipeline

Usage

Docker Usage

1. Build the Docker image

2. Run the ETL pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages