This repository is the first phase of a comprehensive data analytics project leveraging the Kiva Loan datasets. In this phase, we focus on transitioning raw data from Excel to a structured MySQL database. The aim is to create a clean, normalized database ready for analysis in subsequent phases.
- Clean datasets: Prepare and clean raw data in Excel.
- Database schema creation: Design the database structure in MySQL.
- Data import: Load the cleaned datasets into MySQL.
- Normalization: Normalize tables( up to 3 NF) to eliminate redundancy and establish relationships.
- ERD creation: Generate an Entity-Relationship Diagram (ERD) to visualize the database structure.
- SQL Scripts:
create_schema.sql
: Defines the database schema and creates Skeleton tables for all datasets.data_import.sql
: Imports cleaned data into the database.normalization.sql
: Normalizes tables and establishes relationships.
- ERD Diagram: A diagram illustrating the relationships between tables.
-
Kiva Loans
- Rows: 671,205
- Columns: 20
- Contains information on loans, amounts, activities, sectors, and borrower details.
-
Kiva MPI Region Location
- Contains regional information, geographic coordinates, and Multidimensional Poverty Index (MPI) data.
- Rows: 2772
- Columns: 9
-
Loan Theme IDs
- Metadata about loan themes and their types.
- Rows: 779,093
- Columns: 4
-
Loan Themes by Region
-
Provides details about loan themes categorized by region.
-
Rows: 15,736
-
Columns:21
-
- MySQL Workbench: For database creation, normalization, and data import. For ERD creation and visualization.
- Microsoft Excel: For cleaning and preparing datasets.
- Duplicate each dataset before cleaning.
- Standardize column names, data formats, and values.
- Replace missing values and perform data validation.
- Save the cleaned files in the
datasets
directory.
- Run the
create_schema.sql
script in MySQL Workbench to create the database and skeleton tables for the datasets.
-
Execute the
data_import.sql
script to load data into the tables.Check this guide on how to import large data into MySQL with no data lost or compromised: Here
- Execute the
normalization.sql
script to normalize the tables and define relationships.
- MySQL Workbench to create the ERD and export it as
kiva_erd.png
.
(To be added after normalization)
- Add constraints to improve data integrity.
- Optimize queries for faster data retrieval.
Olamide Quzeem
This project is licensed under the MIT License.