🧼 Titanic Data Cleaning Script

This project contains a Python script for preprocessing the original Titanic dataset (titanic_original.csv). The goal is to clean and prepare the data for further analysis or machine learning tasks.

📄 Dataset

Input file: titanic_original.csv
Output file: titanic_cleaned.xlsx (cleaned version saved in Excel format)

🧹 Cleaning Steps

Drop Unnecessary Columns The following columns were removed as they contain many missing values or are not relevant for modeling:
- cabin
- boat
- body
- home.dest
Handle Missing Values in Age
- Missing values in the age column were filled using the mean age.
- All age values were rounded to the nearest integer.
Fix Missing Embarked Values
- Missing values in the embarked column were filled with 'S', the most frequent port of embarkation.
Correct Fare Values
- Zero or negative fare values were replaced with the mean fare of positive fares only.
- Negative fare values were clamped to 0.
Remove Duplicates
- Duplicate records were identified and removed from the dataset.

💾 Output

Cleaned dataset saved as: titanic_cleaned.xlsx
Format: Excel (uses openpyxl engine)

▶️ How to Use

1. Install Dependencies

pip install pandas openpyxl

2. Run the Script

Ensure the titanic_original.csv file is in the same directory, then run:

python titanic_cleaning.py

After execution, titanic_cleaned.xlsx will be generated in the same directory.

📌 Notes

The script is designed to be a lightweight and simple preprocessor for Titanic data.
It’s easily extensible for further cleaning or feature engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PreProccessingData.rar		PreProccessingData.rar
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧼 Titanic Data Cleaning Script

📄 Dataset

🧹 Cleaning Steps

💾 Output

▶️ How to Use

1. Install Dependencies

2. Run the Script

📌 Notes

About

Uh oh!

Releases

Packages

FaNa-AI/preprocessing-Titanic

Folders and files

Latest commit

History

Repository files navigation

🧼 Titanic Data Cleaning Script

📄 Dataset

🧹 Cleaning Steps

💾 Output

▶️ How to Use

1. Install Dependencies

2. Run the Script

📌 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages