This data science project focuses on analyzing survey data from Open Source Mental Illness (OSMI) to understand attitudes towards mental health and the frequency of mental health disorders in the tech industry. The survey data spans five years (2014, 2016, 2017, 2018, and 2019) and contains valuable insights that can shed light on the mental health challenges faced by professionals in the tech workplace.
The main objective of my project is to comprehensively analyze and understand the mental health landscape in the tech industry. The project aims to provide actionable insights and recommendations to address mental health challenges faced by professionals in the tech workplace. Through a rigorous data analysis process, I seek to advocate for improved mental health initiatives and policies that will benefit tech professionals' overall well-being and productivity.
The data was loaded using SQLite and Pandas. SQLite is a lightweight, serverless database management system that allows easy access and manipulation of the data. Pandas, a powerful data manipulation library in Python, was used for data cleaning and preprocessing.
The cleaning and preprocessing of the raw data involved several steps: - Grouping similar questions together for consistency and analysis. - Ensuring consistency in answer values (e.g., mapping 1 to 1.0) to facilitate analysis. - Correcting spelling errors to ensure data integrity.
The EDA phase involved extensive data exploration and visualization. Key elements of EDA included: - Statistical Summaries: Computing summary statistics to understand data distribution, central tendencies, and dispersion. - Charts and Visualizations: Creating various charts (e.g. bar plots, histograms, boxplots) to visualize patterns and identify outliers. - Correlation Analysis: Checking for correlations and relationships between variables to identify potential insights. - Testing for Anomalies: Identifying any unusual or unexpected patterns in the data that may require further investigation.
The data analysis phase focused on deriving meaningful insights and patterns from the dataset. Techniques such as aggregations, grouping, and filtering were used to extract relevant information.
Effective data visualization is crucial for communicating results clearly. The project utilized Matplotlib and Seaborn to create visually appealing and informative charts and plots that complemented the analysis.
Clear and concise explanations were provided in the notebook to guide the reader through the analysis process. The results and their implications were thoroughly explained to facilitate a better understanding of the findings.
Based on the analysis, concrete suggestions and recommendations were offered on how to improve mental health support and awareness in the tech industry. These recommendations were aimed at policymakers, companies, and professionals to foster a healthier and more supportive work environment.
This data science project aimed to shed light on mental health in the tech industry through the analysis of survey data. By leveraging Python, SQL, and Excel, I was able to clean and manipulate the data effectively. Through exploratory data analysis and data visualization, I have gained valuable insights into attitudes towards mental health in the tech workplace.
The analysis presented in this project can serve as a basis for further research and policy decisions aimed at improving mental health support and well-being in the tech industry. Continuous efforts to address mental health challenges can lead to a more inclusive, supportive, and resilient work environment for tech professionals.
I welcome pull requests for this project. If you plan to make significant changes, I recommend that you open an issue first to discuss your proposed changes. Please ensure that you add or update tests as appropriate.