The film industry boasts one of the most successful business models, with demands meeting results more often than not. Box offices host audiences in large numbers, and when the objectives of the movie align with the target audience, the income goes through the roof. For instance, a successful annual business at the box office was recorded in 2018, with sales marginally crossing a cool $41.7 billion. The numbers speak for themselves, and indicate the potential for growth in the film industry, and thereby, the revenue collected by box offices. The research aims at cornering the very reason/reasons that induce the prosperity of the box offices, by analysing metadata from over 7000 films as an attempt to not only understand the major contributing factors to a successful business, but also to predict the worldwide box office revenue. Spearheaded by several influencing parameters such as data points of cast, crew, plot keywords, budgets, posters, release dates and several more, the project will collect said data and administer it to predict the revenue based outcomes. Herein, in this project a comparative analysis was performed on Linear Regression, Random Forest, XG Boost and LG Boost models to find the best suitable model for revenue prediction. The exploratory data analysis included data cleaning, label encoding, and data pre-processing that is introduced in the exploratory data analysis section further, the four regressor models are discussed. The evaluation metrics used to show the superiority of our models is talked about in the evaluation and the result section . Finally, the last section draws conclusion to this research and describes future scope.
Competiton Link: https://www.kaggle.com/c/tmdb-box-office-prediction
MySubmission Link: https://www.kaggle.com/harshvr15/datawrang-eda-models-lr-rf-xgb-lgb-gridsearch