Big Data in Action: a customer sentiment prediction initiative using PySpark and Databricks, tackling over 49,000 e-commerce orders and reviews.
The goal was to understand the drivers behind positive and negative customer reviews by engineering 261 features from product, order, payment, and shipping data. Using ensemble ML methods and combining Random Forest and Gradient Boosting models, we achieved high accuracy over 0.86, efficiently predicting customer sentiments.
Beyond the model, the project delivered actionable insights:
Product description and shipping transparency emerged as key influencers of positive reviews.
Higher product weights and higher shipping costs were linked to negative experiences.
Payment methods and order values revealed behavioral patterns that informed retention strategies.