Movies Project

"Movies capture the imagination of audiences, and data reveals the patterns that keep them coming back."

project_1_main

Project Objectives


I created a comprehensive project using tools as Excel and Python to treat the data and create a ML model, finalizing it with a dashboard built in Tableau. The project objectives are as follow:

  • Analyse a list of movies and determine which Genre is most profitable, have look at the data related to Actors and Directors.
  • Check any trend based on movies released per year and highlight the ones with higher IMDb Score.
  • Understand if we can predict the Box Office results. For that, I will investigate the relationship between variables and develop a predictive model to forecast the Box Office results.

Questions (KPIs)


  1. Are the movies Box Office related to any of the other variables?
  2. Having a bigger budget influences the Box Office?
  3. Are the movies with more Awards more profitable?
  4. Which genre is more profitable?
  5. Which genre is more likely to get awards?

Process


  • Verified data for any duplicates and missing values.
  • Made sure all data is consistant in terms of data types, data format and values.
  • Created a new dataset with only Actors information to join all the 3 variables into 1.
  • Created new variables to add value to the dataset.
  • Created visual representation for multiple variables
  • Did some Statistical Analysis using T-Test and Chi-Square Test techniques.
  • Created 2 ML models (Regression & Classification)
  • Ran a Features Importance technique to confirm all the Statistical and ML results.

Dashboard


movies_project_dashboard

Project Insights


  • The dataset has a total of 3907 movies (After cleaning duplicated)
  • We only have movies from 1929 to 2016
  • 68 movies did not have profit nor loss in earnings and 985 had a loss. That means 2,854 movies had profit
  • 88% of the movies were released after 1990
  • 28.6% of the movies have a budget between 20 and 50 Millions
  • 2006 is the year with more movies released
  • The biggest correlation happens between the Box Office and Earnings, which means movies with higher Box Office tend to be more profitable
  • Movies with higher Budget tend to have higher Box Office and Earnings (Correlation and Chi-Square Test)
  • The more nominations a movie has, higher is the chance to win an award
  • Nominations and Awards are not related to the movie Running time, Box Office, Earnings and IMDb Score
  • Action is the most profitable Genre with a total of $108.21B in Earnings. That makes an average of 114.87 millions per movie
  • Even though Action is the most profitable genre, Comedy is the genre with more movies released (1002)
  • Drama is the genre with more Nominations/Awards
  • Action, Animation and Adventure are the genres generating more movies with higher Box Offices
  • The Actor with more presences in movies is Robert the Niro with a total of 53 presences
  • With the T-Test we verified the movies with awards have higher IMDb Score
  • In the end created a Random Forest Regressor model that confirmed all my previous analysis, with an r^2 = 0.94 , the model explains most of the variability
  • To finalize the entire analysis, I ran a Features Importance technique based on the Random Forest Regressor model and concluded the variable Earnings is the most important one when it comes to predict the Box Office

Final Conclusion


Movies with higher Budget tend to have an higher Box Office and consequently are more profitable, which is pretty obvious, however, having higher Box Office and Earnings does not mean the movie will get more Golden Globes nominations and win more awards which is surprising. Running time is also not related to Nominations and Awards, proving quality always stands over quantity. Even though the results were easily predicted, it was really interesting to analyse this dataset deeper. Movies really have an impact in our lives and knowing what it takes to create a profitable movie makes it even more interesting when we watch them. From now on I will watch movies in a different way, not looking only to the story but also the numbers and its impact in society.