This project involves cleaning and preprocessing the US Health Insurance Dataset, training a machine learning model, and deploying it as an API using FastAPI. The API allows users to make predictions based on input data.
- Dataset Identification
- Project Overview
- Built With
- Installation
- Running the API Locally
- Running the API with Docker
- API Endpoints
- File Structure
- Future Enhancements
- Contact
- License
This dataset contains 1,338 rows of insured data, where insurance charges are given against the following attributes: Age, Sex, BMI, Number of Children, Smoker, and Region. The attributes consist of both numerical and categorical variables.
Link to the dataset: Kaggle Dataset
Feature Name | Data Type | Description |
---|---|---|
Age | Integer | Age of primary beneficiary |
Sex | String | Gender of the insurance contractor (male/female) |
BMI | Float | Body mass index, indicating weight category |
Children | Integer | Number of children covered by health insurance |
Smoker | String | Whether the individual is a smoker (yes/no) |
Region | String | Residential area in the U.S. (northeast, southeast, southwest, northwest) |
Charges | Float | Individual medical costs billed by health insurance |
- Data Cleaning & Preprocessing: Handled in Data Cleaning & Preprocessing.
- Model Training: The best model is selected and saved as a pickle file (best model).
- FastAPI Implementation: The API is built using FastAPI (main.py).
- Schema Definition: Schema.py defines input data validation using Pydantic.
- Deployment: The API is containerized for local deployment using Docker (Dockerfile).
- Python – Programming language.
- NumPy – Numerical computing.
- SciPy – Scientific computing algorithms.
- Pandas – Data manipulation and analysis.
- scikit-learn – Machine learning algorithms.
- Matplotlib / Seaborn – Data visualization.
- FastAPI – Web framework for building APIs in Python.
To run the API locally, follow these steps:
Ensure you have the following installed:
- Python (>=3.8)
- FastAPI
- Uvicorn
- Pandas, NumPy, Scikit-learn
- Docker (for containerized deployment)
git clone https://github.com/Just-Aymz/US-Insurance-Data.git
cd US-Insurance-Data
python -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows
pip install -r requirements.txt
uvicorn API_Files.main:app --reload
API available at: http://127.0.0.1:8000
docker build -t ml-api .
docker run -p 8000:8000 ml-api
- URL:
/
- Method: GET
- Response:
{"message": "Welcome to the Machine Learning API"}
- URL:
/predict
- Method: POST
- Request Body:
{ "age": 25, "bmi": 24.5, "children": 2, "sex": "male", "smoker": "no", "region": "southwest" }
- Response:
{"prediction": 1234.56}
├── API_Files
│ ├── __pycache__/
│ ├── .gitkeep
│ ├── main.py
│ └── Schema.py
├── Dataset
│ ├── .gitkeep
│ └── insurance.csv
├── DockerFiles
│ ├── .dockerignore
│ ├── .gitkeep
│ └── Dockerfile
├── Preprocessing
│ ├── .gitkeep
│ └── Data_Preprocessing.ipynb
├── Static
│ ├── .gitkeep
│ ├── risk_management.jpg
│ └── style.css
├── Templates
│ ├── .gitkeep
│ ├── form.html
│ └── home.html
├── best_model_and_preprocessor.pkl
├── LICENSE
├── README.md
└── requirements.txt
- CI/CD pipeline for automated deployment
- Cloud deployment options (AWS, GCP, DigitalOcean)
- Expanding input validation and error handling
For questions or collaborations, please reach out via:
- Email: [email protected]
- LinkedIn: Amogelang More
This project is open-source under the MIT License.