Skip to content

This repository contains code for Rakamin Academy x Kalbe Nutritional Virtual Internships. Focuses on customer segmentation using clustering and predicting product quantity sold using KMeans models.

Notifications You must be signed in to change notification settings

Fallennnnnn/KalbeNutritionals_VIX_DS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview

This project is part of the Rakamin Academy Virtual Internship in collaboration with Kalbe Nutritionals. As a Data Scientist intern, the objective of this project is to perform various data analysis and machine learning tasks. The project involves the following key tasks:

  • Data Ingestion: Importing and preparing the necessary data for analysis.
  • Exploratory Data Analysis (EDA) Using SQL: Exploring and understanding the data using SQL queries.
  • Visualizing Data Dashboard Using Tableau: Creating interactive and insightful data visualizations using Tableau.
  • Predictions Using ARIMA Models: Building time series forecasting models with ARIMA to predict future trends.
  • Clustering Using KMeans: Utilizing KMeans algorithm for customer segmentation and analysis.

Project Goals

  • Forecast the estimated quantity of products sold, enabling the inventory team to create sufficient daily stock supply.
  • Create customer segments. These customer segments will be utilized by the marketing team to provide personalized promotions and sales treatment.

Getting Started

To Collaborate on this project u need to install these requirements

Tools Used

Libraries Used

Dataset Used

Install Dependencies

Pandas

  pip install pandas

Numpy

   pip install numpy

Matplotlib

  pip install matplotlib

pmdarima

  pip install pmdarima

Seaborn

  pip install seaborn

Project Result

Data Ingestion and Exploratory Data Analysis using PostgreSQL and DBeaver

First we need to install PostgreSQL and DBeaver in local computer and import dataset and use ; separator and this is the DBSchema
Schema

This is the challenge result

  • Average Age Based on Gender
    avg_gender
  • Average Age Based on Marital Status
    avg_marital
  • Store Name with Most Quantity
    store
  • Product with Highest Selling
    prodsell

also you can view the complete challenge query in here Challenge_Query

Visualizing Dashboard using Tableau

  • Make a account in Tableau Public
  • Create Web Authoring and import the dataset given
  • Make 4 worksheet
    • Total quantity sold Month to Month
    • Total amount revenue daily
    • Total quantity sold by products
    • Total revenue by store name
  • Make dashboard and this is the result

dash

you can check the interactive dashboard from this link Tableau Dashboard.

Forecast using ARIMA Models

  • First step before we go to forecast is Data Cleansing. In this step we merge data into 1 table and clear null values and this is the sample data from merged table
    mergedtable
  • Next we make new DataFrame for Forecasting value which involves grouping by date and aggregating the 'qty' column using the sum function
    forecast
    forecast
  • In order to use the ARIMA model, the data must be in a stationary condition. We check for data stationarity by examining the ACF and PACF plots, as well as conducting the ADF test. The ACF and PACF plot results show a horizontal scatter pattern, indicating that the data is stationary. Additionally, the ADF test result reveals p-values close to zero, further confirming that the data is stationary.
    stat
    adf
  • The next step involves modeling using ARIMA, where we need to determine the appropriate values for parameters p, d, and q. To achieve this, we explore two approaches: trial-and-error and auto-fit ARIMA. In this project, we used auto ARIMA method and found that best parameter is ARIMA(5,1,0)
  • This is the result for 1 month quantity sold forecast and the mean is 50 quantity product sold
    qtysold
  • This is the result for 1 month quantity sold forecast for each product
    adf
    you can check the source code here Forecast_Notebook

Clustering using KMeans

  • First step we make new DataFrame for Clstering value which involve grouping by CustomerID and aggregating the 'transaction id' column using the count function, 'qty' column using the sum function , and 'totalamount' column using the sum function
    elbow

  • Next the data is checked for outliers and standardized, as these steps are crucial for the clustering model.

  • Next, the elbow method is applied to determine the optimal number of clusters, resulting in three clusters
    elbow

  • After that we plot the cluster and visualize it
    mergedtable
    cluster
    Based on the customer segmentation with 3 clusters we can segmentize each clusters for better personalized promotion dan sales treatment.

    • Cluster 0 is new customers
    • Cluster 1 is loyal customers
    • Cluster 2 is potential loyal customers
  • For further analysis we use RFM Analysis to make recommendation based on the visualizations
    RFM
    So we can conclude that each cluster has its different characteristics and we can make personalized promotion dan sales treatment like this.

  • Cluster 0 is customers that may need incentives to re-engage with our brand.

    • Strategies that can be used
    • Give Extra Discount
    • Give Extra Benefit After Purchasing Product Like Free Shipping
    • Personalized Promotions
  • Cluster 1 is customers that are valuable to our business and should be targeted with exclusive offers to maintain loyalty.

    • Strategies that can be used
    • Give Loyalty Exclusive Offer
    • Give Loyalty Program and Benefits
    • Early Access to Product and Special Discount
  • Cluster 2 is customers that have potential for upselling, as they already purchase occasionally.

    • Strategies that can be used
    • Bundle Offers
    • Offer Loyalty Program with Various Benefits
    • Special Discount

you can check the source code here Clustering Notebook

About

This repository contains code for Rakamin Academy x Kalbe Nutritional Virtual Internships. Focuses on customer segmentation using clustering and predicting product quantity sold using KMeans models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published