Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In Chapter 24 and Lectures #17 and #18, we worked through a logistic model to predict scoring attempts based on a single independent variable shot_distance,

In Chapter 24 and Lectures #17 and #18, we worked through a logistic model to predict scoring attempts based on a single independent variable shot_distance, as well as a second model that used multiple independent variables, ['shot_distance', 'minute', 'action_type', 'shot_type', 'opponent']. It was noted that the prediction accurracy increased from 0.6 using just shot_distance to 0.725 using the entire list. Are all of those additional variables necessary to get the increased accuracy?

For this program, write a function that identifies which variable increases the accuracy of the oringal model the most.

  • def bestForPredict(df, columns, x_col = "shot_distance", y_col = "shot_made", test_size = 40, random_state = 42):: This function has six inputs:
    • df: a DataFrame that including the specified columns.
    • columns: a list of column names of the specified DataFrame.
    • x_col: a column name of the specified DataFrame containing locations. It is one of the independent variables for the model (the other is from the list columns). It has a default value is "shot_distance".
    • y_col: a column name of the specified DataFrame containing locations. This is the dependent variable (what's being predicted) in the model. It has a default value is "shot_made".
    • test_size: the size of the test set created when the data is divided into test and training sets with train_test_split. The default value is 40.
    • random_state: the random seed used when the data is divided into test and training sets with train_test_split. The default value is 42.
    The function returns the highest prediction accuracy found from the columns inputted, as well as the name of the column that increases prediction accuracy the most.

For example, assuming your function bestForPredict() was in the p37.py for the file lebron.csv, the code:

df = pd.read_csv('lebron.csv') columns = ['minute', 'action_type', 'shot_type', 'opponent'] acc,col_name = p37.bestForPredict(df,columns) print(f'The highest accuracy, {acc}, was obtained by including column, {col_name}.')

would print:

The highest accuracy, 0.725, was obtained by including column, action_type.

Another example with the same DataFrame:

columns = ['minute', 'opponent'] acc,col_name = p37.bestForPredict(df,columns, test_size = 100, random_state = 17) print(f'The highest accuracy, {acc}, was obtained by including column, {col_name}.') 

would print:

The highest accuracy, 0.6, was obtained by including column, opponent.

Note: you should submit a file with only the standard comments at the top, this function, and any helper functions you have written. The grading scripts will then import the file for testing.

Hints:

  • Some of the code in the textbook is deprecated. In particular, the as_matrix and orient='row' does not work in newer versions. Omit the former and replace the latter with:
    rows = df[[x_col,c]].to_dict('records') onehot = DictVectorizer(sparse=False).fit(rows)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Contemporary Issues In Finance

Authors: Simon Grima, Frank Bezzina, Inna Romanova

1st Edition

1786359073, 978-1786359070

More Books

Students also viewed these Finance questions

Question

What are the stages of project management? Write it in items.

Answered: 1 week ago