Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Feb 29, 2024

Predictive data Minning models Continuous - 'Children', 'Income', 'Churn', 'Tenure', 'MonthlyCharge', 'Item1', 'Item2', 'Item3', 'Item4', 'Item5', 'Item6', 'Item7', 'Item8' Categorical - 'Marital', 'Gender',

Predictive data Minning models

Continuous - 'Children', 'Income', 'Churn', 'Tenure', 'MonthlyCharge', 'Item1', 'Item2', 'Item3',

'Item4', 'Item5', 'Item6', 'Item7', 'Item8'

 Categorical - 'Marital', 'Gender', 'Techie', 'Contract', 'Port_modem', 'Tablet', 'InternetService',

'Phone', 'Multiple', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',

'StreamingTV', 'StreamingMovies', 'PaperlessBilling', 'PaymentMethod'

Steps to prepare the data for analysis are:

# Import libraries needed for analysis

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

# Import dataset

data = 'churn_clean.csv'

df = pd.read_csv(data

# Convert the predictor variable into a binary numeric variable

df['Churn'].replace(to_replace='Yes', value=1, inplace=True)

df['Churn'].replace(to_replace='No', value=0, inplace=True)

# Balance labels so there are equal churn vs non-churn

churners_number = len(df[df['Churn'] == 1])

print("Number of churners", churners_number)

churners = (df[df['Churn'] == 1])

non_churners = df[df['Churn'] == 0].sample(n=churners_number)

print("Number of non-churners", len(non_churners))

df2 = churners.append(non_churners)

# Convert categorical variables to binary and drop unnecessary rows

df2 = df.drop(['CaseOrder', 'Customer_id','Interaction', 'UID', 'Job', 'City','State', 'County',

'Zip','Lat','Lng','Population','Area','TimeZone', 'Age', 'Outage_sec_perweek', 'Email', 'Contacts',

'Yearly_equip_failure', 'Bandwidth_GB_Year'], axis =1)

ml_dummies = pd.get_dummies(df2)

ml_dummies.fillna(value=0, inplace=True)

ml_dummies.head()

# Add a random column to the dataframe

ml_dummies['randomColumn'] = np.random.randint(0,1000, size=len(ml_dummies))

# Perform KNN classification

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25,

random_state = 8)

# Classifiers

from sklearn.neighbors import KNeighborsClassifier

from sklearn.tree import DecisionTreeClassifier

classifiers = [

KNeighborsClassifier(5),

DecisionTreeClassifier(max_depth=5)]

# Iterate over classifiers

for item in classifiers:

classifier_name = ((str(item)[:(str(item).find("("))]))

print (classifier_name)

# Create classifier, train and test it

clf = item

clf.fit(X_train, y_train)

pred = clf.predict(X_test)

score = clf.score(X_test, y_test)

print (round(score,3),"\n", "- - - - - ", "\n")

# Scale all variables to a range of 0 to 1

from sklearn.preprocessing import MinMaxScaler

features = ml_dummies.columns.values

scaler = MinMaxScaler(feature_range = (0,1))

scaler.fit(ml_dummies)

ml_dummies = pd.DataFrame(scaler.transform(ml_dummies))

ml_dummies.columns = features

# Create train/test data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25, random_state = 8)

# Run logistic regression model

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

result = model.fit(X_train, y_train)

# Print the prediction accuracy

from sklearn import metrics

prediction_test = model.predict(X_test)

print (metrics.accuracy_score(y_test, prediction_test))

# To get weights of Impactful variables

weights = pd.Series(model.coef_[0],

index=ml_dummies.columns.values)

# Random Forest Algorithm

from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(ml_dummies, label, test_size=0.25, random_state = 8)

model_rf = RandomForestClassifier(n_estimators=1000 , oob_score = True, n_jobs = -1,

random_state =50, max_features = "auto",

max_leaf_nodes = 30)

model_rf.fit(X_train, y_train)

# Make predictions

prediction_test = model_rf.predict(X_test)

print (metrics.accuracy_score(y_test, prediction_test))

# Graph of Random Forest results

importances = model_rf.feature_importances_

weights = pd.Series(importances,

index=ml_dummies.columns.values)

weights.sort_values()[-10:].plot(kind = 'barh')

Part V. Data Summary and Implications

E1. The accuracy of our prediction can be found in the snip below:

Mean Squared Error was computed using the following snip of code:

1. kindly review all the codes and correct it

2. Discuss one limitation of your random forest data analysis.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Business Analytics Communicating With Numbers

Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen

1st Edition

978-1260785005, 1260785009

More Books

Students also viewed these Programming questions

Question

Eastland Company and Westside Company are competing businesses. Both began operations 6 years ago and are quite similar in most respects. The current balance sheet data for the two companies are...

Answered: 1 week ago

Question

★★★★★

Python and Dash Framework Open the ModuleSixMilestone.ipynb file, which contains the starter code for the Grazioso Salvare dashboard. Upload this file into Apporto and open it using the Jupyter...

Answered: 1 week ago

Question

★★★★★

In a two-player, one-shot simultaneous-move game each player can choose strategy A or strategy B. If both players choose strategy A, each earns a payoff of $500. If both players choose strategy B,...

Answered: 1 week ago

Question

★★★★★

Refer to the information in Exercise 17-24. Suppose the assembly division at Quality Time Pieces, Inc. uses the FIFO method of process costing instead of the weighted-average method. Required:...

Answered: 1 week ago

Question

★★★★★

Which of the following is the correct syntax for accessing an element with the id value headline? a. document.getElementsByID("headline") b. document.getElementById("headline") c....

Answered: 1 week ago

Question

★★★★★

3. Use the childs name.

Answered: 1 week ago

Question

★★★★★

A professional diver performs a dive from a platform 10 m above the water surface. Estimate the order of magnitude of the average impact force she experiences in her collision with the water. State...

Answered: 1 week ago

Question

★★★★★

Security compliance is important because it helps protect software sensitive data buildings reputation

Answered: 1 week ago

Question

★★★★★

Planning Procurement. An automobile manufacturer wants to award contracts for the supply of five different engine components. Some of the components are used in most models, while other components...

Answered: 1 week ago

Question

★★★★★

Mike bought a car three years ago. He paid $45,000 and financed it for 5 years at 2% annual interest with monthly payments. He decided to buy a truck and sell his car. About how much is the loan...

Answered: 1 week ago

Question

★★★★★

Briefly answer the following questions: 1. What information is required before data can be encapsulated? 2. What happens when a frame is forwarded to a destination to which it is not intended? How...

Answered: 1 week ago

Question

★★★★★

1. What are the diagnoses of Global Leadership? 2. What solution would you offer based on the definition of Global Leadership?

Answered: 1 week ago

Question

★★★★★

Briefly explain the following paragraph. Recommendations for Access Point and Antenna Placement: The following recommendations are made for the placement of the access point and antenna: The access...

Answered: 1 week ago

Question

★★★★★

Echuca Corporation manufactures microcomputer. To manufacture microcomputer, Echuca Corporation set up the machines and moulds. Set-up overhead costs consist of some costs that are variable and some...

Answered: 1 week ago

Question

★★★★★

How can you detect intrusions to your networks and prevent them? Do you think digital forensics and contingency planning are enough to solve the problem or do you need more technologies? Justify your...

Answered: 1 week ago

Question

★★★★★

On September 1 of the current year, Maria Edsall established a business to manage rental property. She completed the following transactions during September: Opened a business bank account with a...

Answered: 1 week ago

Question

★★★★★

For the data in Exercise 17-19, use the FIFO method to summarize total costs to account for, and assign these costs to units completed and transferred out, and to units in ending work in process....

Answered: 1 week ago

Question

★★★★★

Ian Stevens is a human resource analyst working for the city of Seattle. He is performing a compensation analysis of city employees. The accompanying data set contains three variables: Department,...

Answered: 1 week ago

Question

★★★★★

Which of the following variables are categorical and which are numerical? If the variable is numerical, then specify whether the variable is discrete or continuous. a. Colors of cars in a mall...

Answered: 1 week ago

Question

★★★★★

According to the World Health Organization, the health of an individual, to a large extent, is determined by factors such as income, education, genetics, and social connections. In a survey, 120...

Answered: 1 week ago

Question

★★★★★

If money can earn 6.5% compounded annually for the next 20 years, which of the following annuities has the greater economic value today: $1000 paid at the end of each of the next 10 years, or 10...

Answered: 1 week ago

Question

★★★★★

You can purchase a residential building lot for $90,000 cash, or for $20,000 down and quarterly payments of $5000 for four years. The first payment would be due three months after the purchase date....

Answered: 1 week ago

Question

★★★★★

Annuity G has the same i and PMT as Annuity H. G has twice as many payments as H. Is Gs present value (pick one): (i) double, (ii) more than double, or (iii) less than double the amount of Hs present...

Answered: 1 week ago

Previous Question Next Question