Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Predictive data Minning models Continuous - 'Children', 'Income', 'Churn', 'Tenure', 'MonthlyCharge', 'Item1', 'Item2', 'Item3', 'Item4', 'Item5', 'Item6', 'Item7', 'Item8' Categorical - 'Marital', 'Gender',

Predictive data Minning models

 

Continuous - 'Children', 'Income', 'Churn', 'Tenure', 'MonthlyCharge', 'Item1', 'Item2', 'Item3',

'Item4', 'Item5', 'Item6', 'Item7', 'Item8'

Categorical - 'Marital', 'Gender', 'Techie', 'Contract', 'Port_modem', 'Tablet', 'InternetService',

'Phone', 'Multiple', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',

'StreamingTV', 'StreamingMovies', 'PaperlessBilling', 'PaymentMethod'

 

Steps to prepare the data for analysis are:

 # Import libraries needed for analysis

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

 # Import dataset

data = 'churn_clean.csv'

df = pd.read_csv(data

 

# Convert the predictor variable into a binary numeric variable

df['Churn'].replace(to_replace='Yes', value=1, inplace=True)

df['Churn'].replace(to_replace='No', value=0, inplace=True)

# Balance labels so there are equal churn vs non-churn

churners_number = len(df[df['Churn'] == 1])

print("Number of churners", churners_number)

churners = (df[df['Churn'] == 1])

non_churners = df[df['Churn'] == 0].sample(n=churners_number)

print("Number of non-churners", len(non_churners))

df2 = churners.append(non_churners)

 

# Convert categorical variables to binary and drop unnecessary rows

df2 = df.drop(['CaseOrder', 'Customer_id','Interaction', 'UID', 'Job', 'City','State', 'County',

'Zip','Lat','Lng','Population','Area','TimeZone', 'Age', 'Outage_sec_perweek', 'Email', 'Contacts',

'Yearly_equip_failure', 'Bandwidth_GB_Year'], axis =1)

ml_dummies = pd.get_dummies(df2)

ml_dummies.fillna(value=0, inplace=True)

ml_dummies.head()

 # Add a random column to the dataframe

ml_dummies['randomColumn'] = np.random.randint(0,1000, size=len(ml_dummies))

 

# Perform KNN classification

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25,

random_state = 8)

 

# Classifiers

from sklearn.neighbors import KNeighborsClassifier

from sklearn.tree import DecisionTreeClassifier

classifiers = [

KNeighborsClassifier(5),

DecisionTreeClassifier(max_depth=5)]

 

# Iterate over classifiers

for item in classifiers:

classifier_name = ((str(item)[:(str(item).find("("))]))

print (classifier_name)

 

# Create classifier, train and test it

clf = item

clf.fit(X_train, y_train)

pred = clf.predict(X_test)

score = clf.score(X_test, y_test)

print (round(score,3),"\n", "- - - - - ", "\n")

 

# Scale all variables to a range of 0 to 1

from sklearn.preprocessing import MinMaxScaler

features = ml_dummies.columns.values

scaler = MinMaxScaler(feature_range = (0,1))

scaler.fit(ml_dummies)

ml_dummies = pd.DataFrame(scaler.transform(ml_dummies))

ml_dummies.columns = features

 

# Create train/test data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25, random_state = 8)

 

# Run logistic regression model

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

result = model.fit(X_train, y_train)

 

# Print the prediction accuracy

from sklearn import metrics

prediction_test = model.predict(X_test)

print (metrics.accuracy_score(y_test, prediction_test))

 

# To get weights of Impactful variables

weights = pd.Series(model.coef_[0],

index=ml_dummies.columns.values)

 

# Random Forest Algorithm

from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25, random_state = 8)

model_rf = RandomForestClassifier(n_estimators=1000 , oob_score = True, n_jobs = -1,

random_state =50, max_features = "auto",

max_leaf_nodes = 30)

model_rf.fit(X_train, y_train)

 

# Make predictions

prediction_test = model_rf.predict(X_test)

print (metrics.accuracy_score(y_test, prediction_test))

 

# Graph of Random Forest results

importances = model_rf.feature_importances_

weights = pd.Series(importances,

index=ml_dummies.columns.values)

weights.sort_values()[-10:].plot(kind = 'barh')

 

Part V. Data Summary and Implications

E1. The accuracy of our prediction can be found in the snip below:

Mean Squared Error was computed using the following snip of code:

 

 

1. kindly review all the codes and correct it

2. Discuss one limitation of your random forest data analysis.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Business Analytics Communicating With Numbers

Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen

1st Edition

978-1260785005, 1260785009

More Books

Students also viewed these Programming questions

Question

3. Use the childs name.

Answered: 1 week ago