ETL Retail Data Centralisation

🛠️ AiCore Project

This project is a component of my extensive training in 'Data and AI' at AiCore.

🚨 Note: Unlike many learning projects, the developments and code implementations in this repository are not driven by tutorials. Each line of code and every strategic decision was conceptualized and executed by myself, adhering to fundamental guidelines provided by the instructors at AiCore.

🌟 Objective and Scope:

This project primarily aims to build an ETL pipeline for disparate data paralleling what would be seen in a data-immature retail organisation. Using OOP principles, I wrote code for extraction, cleaning and upload to a postgres database.

📌 Table of Contents

Introduction
Data Sources Overview
Technologies
Setup
Usage
Images
BI SQL queries
License

🎯 Introduction

ℹ️ Task: The core objective is to manage extensive data derived from six unique sources, each marked with distinct characteristics, necessary cleaning/extraction steps, and specific key fields. Below is an inclusive overview:

Each source necessitates tailored strategies for extraction, cleaning, and utilization due to their unique challenges.

🤔 Approach: I initiated with writing and testing functions in Notebooks for each data source. After meticulous verification, the code was systematically organized into scripts, with strict adherence to OOP principles ensuring modularity and readability.

📊 Data Sources Overview

1.RDS Database in AWS (Order Table)

Table: order_table
Relevance: High; contains crucial sales information.
Fields to Use: date_uuid, user_uuid, card_number, store_code, product_code, product_quantity.

2. RDS Database in AWS (User Data)

Table: dim_users
Final Table Primary Key: user_uuid

3. AWS S3 Public Link (Card Details)

Source: PDF available through an S3 public link.
Table: dim_card_details
Reading Method: Utilizing the tabula package for PDF extraction.
Final Table Primary Key: card_number

4. AWS S3 Bucket (Product Data)

Table: dim_product
Reading Method: Using boto3 for data retrieval.
Final Table Primary Key: product_code

5. Restful API (Store Details)

Table: dim_store_details
Reading Method: GET method from the API.
Data Format: JSON; convert to a Pandas DataFrame.
Final Table Primary Key: store_code

6. Public Link (Date Times)

Table: dim_date_times
Reading Method: JSON available through a public link; convert to a Pandas DataFrame.
Final Table Primary Key: date_uuid

💻 Technologies

Language: Python
Packages: Tabula, Boto3, Pandas, Tree, SQLAlchemy, Argparse, Pyyaml, Json, Requests, OS
Cloud Services: AWS RDS, S3
Database Management System: PostgreSQL (PGAdmin)

🛠 Setup

To run this project, you need to create a .env file in your root directory and add your API key:

API_KEY=your_api_key

You need to create a db_creds.yaml file in your root directory and add your postgres and RDS details:

RDS:
  HOST: 
  PASSWORD: 
  USER: 
  DATABASE: 
  PORT: 

LOCAL:
  HOST: 
  PASSWORD: 
  USER: 
  DATABASE: 
  PORT:

These are required for the final centralisation of tables into Postgres and the extraction from RDS.

📘 Usage

python main.py user

Would clean and centralise the user table in your PG Database. You specify which table you want to extract, transform, load.

python main.py choices

This shows the valid arguments to pass in.

BI SQL queries

Some business queries run on the cleaned/ centralised data Here

License 📄

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
images		images
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_cleaning.py		data_cleaning.py
data_extraction.py		data_extraction.py
database_utils.py		database_utils.py
main.py		main.py
queries.md		queries.md
requirements.txt		requirements.txt
transformations.py		transformations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ETL Retail Data Centralisation

🛠️ AiCore Project

🌟 Objective and Scope:

📌 Table of Contents

🎯 Introduction

📊 Data Sources Overview

1.RDS Database in AWS (Order Table)

2. RDS Database in AWS (User Data)

3. AWS S3 Public Link (Card Details)

4. AWS S3 Bucket (Product Data)

5. Restful API (Store Details)

6. Public Link (Date Times)

💻 Technologies

🛠 Setup

📘 Usage

BI SQL queries

License 📄

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mishadcf/Retail-data-ETL-Centralisation

Folders and files

Latest commit

History

Repository files navigation

ETL Retail Data Centralisation

🛠️ AiCore Project

🌟 Objective and Scope:

📌 Table of Contents

🎯 Introduction

📊 Data Sources Overview

1.RDS Database in AWS (Order Table)

2. RDS Database in AWS (User Data)

3. AWS S3 Public Link (Card Details)

4. AWS S3 Bucket (Product Data)

5. Restful API (Store Details)

6. Public Link (Date Times)

💻 Technologies

🛠 Setup

📘 Usage

BI SQL queries

License 📄

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages