Refer to the scenario in Problem 52 regarding the identification of movies that win the Best Picture
Question:
Refer to the scenario in Problem 52 regarding the identification of movies that win the Best Picture Oscar. Apply k-nearest neighbors to classify observations as winning best picture or not by using Winner as the target (or response) variable. Use 100% of the data for training and validation (do not use any data as a test set).
a. Based on all the input variables, determine the value of k that maximizes the AUC in a validation procedure.
b. Note that each year there is only one winner of the Best Picture Oscar. Knowing this, examine the confusion matrix and explain what is wrong with classifying a movie based on a cutoff value?
c. What is the best way to use the model to predict the annual winner?
Problem 52
Each year, the American Academy of Motion Picture Arts and Sciences recognizes excellence in the film industry by honoring directors, actors, and writers with awards (called “Oscars”) in different categories. The most notable of these awards is the Oscar for Best Picture. Data has been collected on a sample of movies nominated for the Best Picture Oscar. The variables include total number of Oscar nominations across all award categories, number of Golden Globe awards won (the Golden Globe award show precedes the Academy Awards), the genre of the movie, and whether or not the movie won the Best Picture Oscar award.
Apply logistic regression with lasso regularization to classify winners of the Best Picture Oscar by using Winner as the target (or response) variable. Use 100% of the data for training and validation (do not use any data as a test set).
Step by Step Answer:
Business Analytics
ISBN: 9780357902219
5th Edition
Authors: Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann