Skip to content

A FastAPI-powered machine learning API for predicting health insurance charges based on demographic and lifestyle factors. Includes data preprocessing, model training, Dockerized deployment, and simple web interphase.

License

Notifications You must be signed in to change notification settings

Just-Aymz/US-Insurance-Data

Repository files navigation

US Health Insurance Dataset

This project involves cleaning and preprocessing the US Health Insurance Dataset, training a machine learning model, and deploying it as an API using FastAPI. The API allows users to make predictions based on input data.

Table of Contents

Dataset Identification

About this File

This dataset contains 1,338 rows of insured data, where insurance charges are given against the following attributes: Age, Sex, BMI, Number of Children, Smoker, and Region. The attributes consist of both numerical and categorical variables.

Link to the dataset: Kaggle Dataset

Feature Name Data Type Description
Age Integer Age of primary beneficiary
Sex String Gender of the insurance contractor (male/female)
BMI Float Body mass index, indicating weight category
Children Integer Number of children covered by health insurance
Smoker String Whether the individual is a smoker (yes/no)
Region String Residential area in the U.S. (northeast, southeast, southwest, northwest)
Charges Float Individual medical costs billed by health insurance

Project Overview

  • Data Cleaning & Preprocessing: Handled in Data Cleaning & Preprocessing.
  • Model Training: The best model is selected and saved as a pickle file (best model).
  • FastAPI Implementation: The API is built using FastAPI (main.py).
  • Schema Definition: Schema.py defines input data validation using Pydantic.
  • Deployment: The API is containerized for local deployment using Docker (Dockerfile).

Built With

  • Python – Programming language.
  • NumPy – Numerical computing.
  • SciPy – Scientific computing algorithms.
  • Pandas – Data manipulation and analysis.
  • scikit-learn – Machine learning algorithms.
  • Matplotlib / Seaborn – Data visualization.
  • FastAPI – Web framework for building APIs in Python.

Installation

To run the API locally, follow these steps:

Prerequisites

Ensure you have the following installed:

  • Python (>=3.8)
  • FastAPI
  • Uvicorn
  • Pandas, NumPy, Scikit-learn
  • Docker (for containerized deployment)

Cloning the Repository

git clone https://github.com/Just-Aymz/US-Insurance-Data.git
cd US-Insurance-Data

Creating a Virtual Environment

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate  # On Windows

Install Dependencies

pip install -r requirements.txt

Running the API Locally

uvicorn API_Files.main:app --reload

API available at: http://127.0.0.1:8000

Running the API with Docker

Build the Docker Image

docker build -t ml-api .

Run the Docker Container

docker run -p 8000:8000 ml-api

API Endpoints

Home Route

  • URL: /
  • Method: GET
  • Response:
    {"message": "Welcome to the Machine Learning API"}

Prediction Route

  • URL: /predict
  • Method: POST
  • Request Body:
    {
      "age": 25,
      "bmi": 24.5,
      "children": 2,
      "sex": "male",
      "smoker": "no",
      "region": "southwest"
    }
  • Response:
    {"prediction": 1234.56}

File Structure

├── API_Files
│   ├── __pycache__/
│   ├── .gitkeep
│   ├── main.py
│   └── Schema.py
├── Dataset
│   ├── .gitkeep
│   └── insurance.csv
├── DockerFiles
│   ├── .dockerignore
│   ├── .gitkeep
│   └── Dockerfile
├── Preprocessing
│   ├── .gitkeep
│   └── Data_Preprocessing.ipynb
├── Static
│   ├── .gitkeep
│   ├── risk_management.jpg
│   └── style.css
├── Templates
│   ├── .gitkeep
│   ├── form.html
│   └── home.html
├── best_model_and_preprocessor.pkl
├── LICENSE
├── README.md
└── requirements.txt

Future Enhancements

  • CI/CD pipeline for automated deployment
  • Cloud deployment options (AWS, GCP, DigitalOcean)
  • Expanding input validation and error handling

Contact

For questions or collaborations, please reach out via:

License

This project is open-source under the MIT License.

About

A FastAPI-powered machine learning API for predicting health insurance charges based on demographic and lifestyle factors. Includes data preprocessing, model training, Dockerized deployment, and simple web interphase.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published