US Health Insurance Dataset

This project involves cleaning and preprocessing the US Health Insurance Dataset, training a machine learning model, and deploying it as an API using FastAPI. The API allows users to make predictions based on input data.

Dataset Identification

About this File

This dataset contains 1,338 rows of insured data, where insurance charges are given against the following attributes: Age, Sex, BMI, Number of Children, Smoker, and Region. The attributes consist of both numerical and categorical variables.

Link to the dataset: Kaggle Dataset

Feature Name	Data Type	Description
Age	Integer	Age of primary beneficiary
Sex	String	Gender of the insurance contractor (male/female)
BMI	Float	Body mass index, indicating weight category
Children	Integer	Number of children covered by health insurance
Smoker	String	Whether the individual is a smoker (yes/no)
Region	String	Residential area in the U.S. (northeast, southeast, southwest, northwest)
Charges	Float	Individual medical costs billed by health insurance

Project Overview

Data Cleaning & Preprocessing: Handled in Data Cleaning & Preprocessing.
Model Training: The best model is selected and saved as a pickle file (best model).
FastAPI Implementation: The API is built using FastAPI (main.py).
Schema Definition: Schema.py defines input data validation using Pydantic.
Deployment: The API is containerized for local deployment using Docker (Dockerfile).

Built With

Python – Programming language.
NumPy – Numerical computing.
SciPy – Scientific computing algorithms.
Pandas – Data manipulation and analysis.
scikit-learn – Machine learning algorithms.
Matplotlib / Seaborn – Data visualization.
FastAPI – Web framework for building APIs in Python.

Installation

To run the API locally, follow these steps:

Prerequisites

Ensure you have the following installed:

Python (>=3.8)
FastAPI
Uvicorn
Pandas, NumPy, Scikit-learn
Docker (for containerized deployment)

Cloning the Repository

git clone https://github.com/Just-Aymz/US-Insurance-Data.git
cd US-Insurance-Data

Creating a Virtual Environment

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate  # On Windows

Install Dependencies

pip install -r requirements.txt

Running the API Locally

uvicorn API_Files.main:app --reload

API available at: http://127.0.0.1:8000

Running the API with Docker

Build the Docker Image

docker build -t ml-api .

Run the Docker Container

docker run -p 8000:8000 ml-api

API Endpoints

Home Route

URL: /
Method: GET

Response:

{"message": "Welcome to the Machine Learning API"}

Prediction Route

URL: /predict
Method: POST

Request Body:

{
  "age": 25,
  "bmi": 24.5,
  "children": 2,
  "sex": "male",
  "smoker": "no",
  "region": "southwest"
}

Response:
```
{"prediction": 1234.56}
```

File Structure

├── API_Files
│   ├── __pycache__/
│   ├── .gitkeep
│   ├── main.py
│   └── Schema.py
├── Dataset
│   ├── .gitkeep
│   └── insurance.csv
├── DockerFiles
│   ├── .dockerignore
│   ├── .gitkeep
│   └── Dockerfile
├── Preprocessing
│   ├── .gitkeep
│   └── Data_Preprocessing.ipynb
├── Static
│   ├── .gitkeep
│   ├── risk_management.jpg
│   └── style.css
├── Templates
│   ├── .gitkeep
│   ├── form.html
│   └── home.html
├── best_model_and_preprocessor.pkl
├── LICENSE
├── README.md
└── requirements.txt

Future Enhancements

CI/CD pipeline for automated deployment
Cloud deployment options (AWS, GCP, DigitalOcean)
Expanding input validation and error handling

Contact

For questions or collaborations, please reach out via:

Email: [email protected]
LinkedIn: Amogelang More

License

This project is open-source under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

US Health Insurance Dataset

Table of Contents

Dataset Identification

About this File

Project Overview

Built With

Installation

Prerequisites

Cloning the Repository

Creating a Virtual Environment

Install Dependencies

Running the API Locally

Running the API with Docker

Build the Docker Image

Run the Docker Container

API Endpoints

Home Route

Prediction Route

File Structure

Future Enhancements

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.venv		.venv
API_Files		API_Files
Dataset		Dataset
DockerFiles		DockerFiles
Preprocessing		Preprocessing
Static		Static
Templates		Templates
LICENSE		LICENSE
README.md		README.md
best_model_and_preprocessor.pkl		best_model_and_preprocessor.pkl
requirements.txt		requirements.txt

License

Just-Aymz/US-Insurance-Data

Folders and files

Latest commit

History

Repository files navigation

US Health Insurance Dataset

Table of Contents

Dataset Identification

About this File

Project Overview

Built With

Installation

Prerequisites

Cloning the Repository

Creating a Virtual Environment

Install Dependencies

Running the API Locally

Running the API with Docker

Build the Docker Image

Run the Docker Container

API Endpoints

Home Route

Prediction Route

File Structure

Future Enhancements

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages