Project Overview

This project is part of the Rakamin Academy Virtual Internship in collaboration with Kalbe Nutritionals. As a Data Scientist intern, the objective of this project is to perform various data analysis and machine learning tasks. The project involves the following key tasks:

Data Ingestion: Importing and preparing the necessary data for analysis.
Exploratory Data Analysis (EDA) Using SQL: Exploring and understanding the data using SQL queries.
Visualizing Data Dashboard Using Tableau: Creating interactive and insightful data visualizations using Tableau.
Predictions Using ARIMA Models: Building time series forecasting models with ARIMA to predict future trends.
Clustering Using KMeans: Utilizing KMeans algorithm for customer segmentation and analysis.

Project Goals

Forecast the estimated quantity of products sold, enabling the inventory team to create sufficient daily stock supply.
Create customer segments. These customer segments will be utilized by the marketing team to provide personalized promotions and sales treatment.

Getting Started

To Collaborate on this project u need to install these requirements

Tools Used

Libraries Used

Dataset Used

Install Dependencies

Pandas

  pip install pandas

Numpy

   pip install numpy

Matplotlib

  pip install matplotlib

pmdarima

  pip install pmdarima

Seaborn

  pip install seaborn

Project Result

Data Ingestion and Exploratory Data Analysis using PostgreSQL and DBeaver

First we need to install PostgreSQL and DBeaver in local computer and import dataset and use ; separator and this is the DBSchema

This is the challenge result

Average Age Based on Gender
Average Age Based on Marital Status
Store Name with Most Quantity
Product with Highest Selling

also you can view the complete challenge query in here Challenge_Query

Visualizing Dashboard using Tableau

Make a account in Tableau Public
Create Web Authoring and import the dataset given
Make 4 worksheet
- Total quantity sold Month to Month
- Total amount revenue daily
- Total quantity sold by products
- Total revenue by store name
Make dashboard and this is the result

you can check the interactive dashboard from this link Tableau Dashboard.

Forecast using ARIMA Models

First step before we go to forecast is Data Cleansing. In this step we merge data into 1 table and clear null values and this is the sample data from merged table
Next we make new DataFrame for Forecasting value which involves grouping by date and aggregating the 'qty' column using the sum function
In order to use the ARIMA model, the data must be in a stationary condition. We check for data stationarity by examining the ACF and PACF plots, as well as conducting the ADF test. The ACF and PACF plot results show a horizontal scatter pattern, indicating that the data is stationary. Additionally, the ADF test result reveals p-values close to zero, further confirming that the data is stationary.
The next step involves modeling using ARIMA, where we need to determine the appropriate values for parameters p, d, and q. To achieve this, we explore two approaches: trial-and-error and auto-fit ARIMA. In this project, we used auto ARIMA method and found that best parameter is ARIMA(5,1,0)
This is the result for 1 month quantity sold forecast and the mean is 50 quantity product sold
This is the result for 1 month quantity sold forecast for each product

you can check the source code here Forecast_Notebook

Clustering using KMeans

First step we make new DataFrame for Clstering value which involve grouping by CustomerID and aggregating the 'transaction id' column using the count function, 'qty' column using the sum function , and 'totalamount' column using the sum function
Next the data is checked for outliers and standardized, as these steps are crucial for the clustering model.
Next, the elbow method is applied to determine the optimal number of clusters, resulting in three clusters
After that we plot the cluster and visualize it

Based on the customer segmentation with 3 clusters we can segmentize each clusters for better personalized promotion dan sales treatment.
- Cluster 0 is new customers
- Cluster 1 is loyal customers
- Cluster 2 is potential loyal customers
For further analysis we use RFM Analysis to make recommendation based on the visualizations

So we can conclude that each cluster has its different characteristics and we can make personalized promotion dan sales treatment like this.
Cluster 0 is customers that may need incentives to re-engage with our brand.
- Strategies that can be used
- Give Extra Discount
- Give Extra Benefit After Purchasing Product Like Free Shipping
- Personalized Promotions
Cluster 1 is customers that are valuable to our business and should be targeted with exclusive offers to maintain loyalty.
- Strategies that can be used
- Give Loyalty Exclusive Offer
- Give Loyalty Program and Benefits
- Early Access to Product and Special Discount
Cluster 2 is customers that have potential for upselling, as they already purchase occasionally.
- Strategies that can be used
- Bundle Offers
- Offer Loyalty Program with Various Benefits
- Special Discount

you can check the source code here Clustering Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Overview

Project Goals

Getting Started

Tools Used

Libraries Used

Dataset Used

Install Dependencies

Project Result

Data Ingestion and Exploratory Data Analysis using PostgreSQL and DBeaver

Visualizing Dashboard using Tableau

Forecast using ARIMA Models

Clustering using KMeans

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Clustering		Clustering
Dashboard		Dashboard
Dataset		Dataset
EDA_SQL		EDA_SQL
Predictions		Predictions
README.md		README.md

Fallennnnnn/KalbeNutritionals_VIX_DS

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Project Goals

Getting Started

Tools Used

Libraries Used

Dataset Used

Install Dependencies

Project Result

Data Ingestion and Exploratory Data Analysis using PostgreSQL and DBeaver

Visualizing Dashboard using Tableau

Forecast using ARIMA Models

Clustering using KMeans

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages