Skip to content

FaNa-AI/logisticRegression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🚒 Titanic Survival Prediction using Logistic Regression

This project uses a logistic regression model to predict passenger survival on the Titanic. The dataset is preprocessed and evaluated with standard machine learning practices using Scikit-Learn, Pandas, and Seaborn.


πŸ“ Dataset

  • The dataset used is a cleaned Excel file: PreProccessing.Titanic.xlsx
  • Target column: survived
  • Removed columns: name, ticket (irrelevant for modeling)
  • Missing values in features are imputed (numerical: median, categorical: most frequent)

🧠 Model Overview

  • Model: Logistic Regression

  • Preprocessing:

    • Numerical features (pclass, age, sibsp, parch, fare): median imputation + scaling
    • Categorical features (sex, embarked): mode imputation + one-hot encoding
  • Evaluation:

    • Accuracy
    • Confusion Matrix
    • ROC Curve & AUC
    • Classification Report

πŸ”§ How to Run

1. Install Dependencies

pip install pandas matplotlib seaborn scikit-learn openpyxl

2. Prepare the Dataset

Place the PreProccessing.Titanic.xlsx file in the project directory.

3. Run the Script

python titanic_logistic_regression.py

πŸ“Š Outputs

Confusion Matrix

A heatmap showing true positives, true negatives, false positives, and false negatives.

ROC Curve

Plots True Positive Rate vs False Positive Rate. Includes Area Under Curve (AUC) score.

Classification Report

Detailed metrics: precision, recall, F1-score for both classes.


βš™οΈ Preprocessing Pipeline

  • Built with ColumnTransformer and Pipeline
  • Numerical and categorical data handled separately
  • Improves modularity and reproducibility

πŸ“ˆ Example Output

Accuracy: 0.81

Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.88      0.86       105
           1       0.76      0.70      0.73        74

    accuracy                           0.81       179
   macro avg       0.80      0.79      0.79       179
weighted avg       0.81      0.81      0.81       179

πŸ“Œ Features

  • Clean and readable pipeline-based preprocessing
  • Visualizations: Confusion matrix and ROC curve
  • Performance metrics for evaluation
  • Easy to extend with other models (e.g., SVM, Random Forest)

About

logistic regression model to predict Titanic survival with clean data and evaluation metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published