Question

1 Approved Answer

Posted on Aug 26, 2024

TODO 13 Complete the following Standardization class. Recall that we want to compute the mean and STD for each column! Think about what value we

TODO 13

Complete the following Standardization class. Recall that we want to compute the mean and STD for each column! Think about what value we need to set the 'axis' argument equal to in order to achieve this.

In the fit() method, compute the mean of the input X using np.mean(). Store the output into the variable self.mean.

In the fit() method, compute the STD of the input X using np.std(). Store the output into the variable self.std.

In the transform() method, compute and return the standardization for the input X. In other words, convert the standardization general formula into code and return the output.

class Standardization(BaseEstimator, TransformerMixin): def __init__(self): pass def fit(self, X, y=None): # TODO 13.1 self.mean = # TODO 13.2 self.std = # Always return self return self def transform(self, X): # TODO 13.3 return

TODO 14

Define an instance of the Sklearn ColumnTransformer class that will apply our Standardization class to ALL our features for X_train and X_test.

Define an instance of sklean's ColumnTransformer class which takes a list of tuples. Since we only have 1 data transformation our list will only have 1 tuple. Define the tuple as follows:

In the first element provide the string 'scaler' which will act as the name for this transformation stage.

In the second element of the tuple pass an instance of the Standardization class (as you wrote in TODO 13).

In the third element pass the column names of X_train. Recall, DataFrame has a class that stores the names of the columns called columns.

Store the output into the variable after_pipe.

Using after_pipe call the fit_transform() method and pass our train data X_train. Store the output into the variable X_train_scaled.

Using after_pipe call the transform() method and pass our test data X_test. Store the output into the variable X_test_scaled_df.

# TODO 14.1 after_pipe =

todo_check([ (isinstance(after_pipe.transformers, list), 'after_pipe did not recieve input arguments as a list'), (isinstance(after_pipe.transformers[0], tuple), 'after_pipe did not recieve tuples as elements inside the list.'), (isinstance(after_pipe.transformers[0][1], Standardization), 'after_pipe does not seem to contain the Standardization class!'), (np.all(after_pipe.transformers[0][-1] == X_train.columns), 'after_pipe did not recieve the correct column names! Make sure the 3rd element of the tuple contains ALL the column names from X_train!') ])

# TODO 14.2 X_train_scaled = print(X_train_scaled)

todo_check([ (X_train_scaled.shape == (413, 29), 'X_train_scaled shape is not the correct shape of (413, 29)'), (np.all(np.isclose(X_train_scaled[1, [1, -1]], np.array([-0.71611487, -0.06781709]),rtol=.01)), 'The values of X_train_scaled were not correct! Make sure you used fit_transform()!'), ])

# TODO 14.3 X_test_scaled = print(X_test_scaled)

todo_check([ (X_test_scaled.shape == (104, 29), 'X_test_scaled shape is not the correct shape of (104, 29)'), (np.all(np.isclose(X_test_scaled[1, [1, -1]], np.array([1.396424 , 2.42182367]), rtol=.01)), 'The values of X_test_scaled were not correct! Make sure you used transform()!'), ])

TODO 15

Complete the definition of ColumnTransformer below.

Finish the below code by adding a 2nd tuple which will perform the standardization only on the numerical features of our data. Define the tuple by following the below instructions:

In the first element of provide the string 'scaler' which will act as the name for this transformation stage.

In the second element of the tuple pass the instance of the Standardization class.

after_pipe = ColumnTransformer([ ('categorical_pass', 'passthrough', X_train.iloc[:, :19].columns), # TODO 15.1

])

todo_check([ (isinstance(after_pipe.transformers, list), 'after_pipe did not recieve input arguments as a list'), (len(after_pipe.transformers), 'after_pipe list of transformers should have length of 2!'), (isinstance(after_pipe.transformers[1], tuple), 'after_pipe did not recieve tuples as elements inside the list.'), (isinstance(after_pipe.transformers[1][1], Standardization), 'after_pipe does not seem to contain the Standardization class!'), (np.all(after_pipe.transformers[1][-1].values == np.array(['X', 'Y', 'FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH', 'wind', 'rain'],dtype=object)), 'after_pipe did not recieve the correct column names! Make sure the 3rd element of the tuple contains ALL the column names from X_train!') ])

In the third element pass the column names of ONLY our numerical features using X_train. Recall, you'll need to slice the DataFrame starting from 'X' (column 19) and going to 'rain' (the last column)! We printed out the columns names below in case you forgot.

Store the output into the variable after_pipe.