🚨 Insurance Fraud Detection 🚨

A robust machine learning project to detect fraudulent insurance claims using Python (EDA, feature engineering, model development) and Power BI (business dashboard).
The solution balances recall (fraud detection coverage) and precision (investigation efficiency) while providing interpretable results for business stakeholders.

Project Overview

Fraudulent claims cause significant financial losses for insurers.
This project demonstrates how machine learning can help identify high-risk claims, reduce losses, and provide transparent explanations for fraud decisions.

Key components:

Exploratory Data Analysis (EDA) – cleaning, binning, handling leakage.
Modeling – Logistic Regression, Random Forest, and tuned XGBoost with SMOTE for class imbalance.
Threshold Optimization – F2/F3 score and precision-recall trade-offs.
Interpretability – Global & Local SHAP explanations.
Business Dashboard – One-page Power BI dashboard with KPIs, trends, fraud by category, and top fraud drivers.

Features

Machine Learning
- Handles severe class imbalance with SMOTE.
- Hyperparameter tuning with Optuna.
- Evaluation using ROC-AUC, PR-AUC, precision, recall, F2/F3.
Explainability
- Global SHAP (top contributing fraud drivers).
- Local SHAP (top reasons per claim flagged as fraud).
- Aggregated outputs for business users.
Power BI Dashboard
- Live Dashboard Link
- KPIs: Fraud Rate %, Fraud Detected, Investigation Efficiency %, Fraud Detection Coverage %.
- Fraud trends over time.
- Fraud by categories (vehicle usage, region, etc.).
- Top fraud drivers (interpretable).
- Per-claim fraud explanations.

Repository Structure

. ├── notebooks/

│ └── Insurance_Fraud_Detection.ipynb

├── data/

│ ├── FactClaims.csv

│ ├── FactModelScores.csv

│ ├── FactLocalReasons.csv

│ ├── FactLocalReasonsAgg.csv

│ ├── DimFeature.csv

│ └── DimDate.csv

├── r/

│ └── Load and Export Dataset from R.R

├── dashboard/

│ └── Insurance_Fraud_Dashboard.pbix

├── src/

├── requirements.txt

└── README.md

Data Source & Preparation

This project uses the freMPL6 insurance claims dataset, available through the CASdatasets package in R.

We provide two options:

Option A — Quick Start (use the bundled CSV) We’ve already exported the dataset for you: data/raw/freMPL6.csv. Load it directly in Python with: import pandas as pd df_raw = pd.read_csv("data/raw/freMPL6.csv")

Option B — Reproduce the export with R (full reproducibility) Use the provided R script: r/Load and Export Dataset from R.R. Run it from the repo root: Rscript "r/Load and Export Dataset from R.R"

⚠️ Note on provenance: The dataset originates from actuarial research and teaching materials and is not owned by this repository. We credit the original authors and references below.

References Agresti, Alan. 2013. Categorical Data Analysis, 3rd Edition. Charpentier, Arthur. 2014. Computational Actuarial Science with R. The R Series. Chapman & Hall/CRC.

Link Source Project and Author Frequency analysis with a Zero Inflated Regression of a French Motor Third Party Liability dataset Author: Siharath Julien Published: October 1, 2024 Source Link (file:///C:/Users/Source/OneDrive%20-%20IESEG/Documents/Data%20Projects/Insurance%20Fraud%20Detection/Frequency%20analysis%20with%20a%20Zero%20Inflated%20Regression%20of%20a%20French%20Motor%20Third%20Party%20Liability%20dataset.htm)

Installation

Clone the repo:

git clone https://github.com/your-username/insurance-fraud-detection.git cd insurance-fraud-detection

Create a virtual environment and install dependencies: python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt

Usage

Data Preparation & Modeling
Run the Jupyter notebook: jupyter notebook notebooks/Insurance_Fraud_Detection.ipynb
Export Model Outputs for Power BI The notebook generates CSVs: FactClaims FactModelScores FactLocalReasons FactLocalReasonsAgg DimFeature DimDate
Dashboard (Published and Public) Open Insurance_Fraud_Dashboard.pbix in Power BI Desktop. Or view the published dashboard here:

Insurance Fraud Detection Dashboard: Live Dashboard Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚨 Insurance Fraud Detection 🚨

Project Overview

Features

Repository Structure

Data Source & Preparation

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DimDate.csv		DimDate.csv
DimFeature.csv		DimFeature.csv
FactClaims.csv		FactClaims.csv
FactLocalReasons.csv		FactLocalReasons.csv
FactLocalReasonsAgg.csv		FactLocalReasonsAgg.csv
FactModelScores.csv		FactModelScores.csv
Insurance Fraud Detection Dasboard - Power BI.pdf		Insurance Fraud Detection Dasboard - Power BI.pdf
Insurance Fraud Detection Dasboard.pbix		Insurance Fraud Detection Dasboard.pbix
Insurance Fraud Detection.ipynb		Insurance Fraud Detection.ipynb
Load and Export Dataset from R.R		Load and Export Dataset from R.R
README.md		README.md
freMPL6.csv		freMPL6.csv

gaurav1nemani/Insurance-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

🚨 Insurance Fraud Detection 🚨

Project Overview

Features

Repository Structure

Data Source & Preparation

Installation

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages