Question

1 Approved Answer

Posted on Sep 24, 2024

Nave Bayes and Decision tree Classification Part A: Naive Bayes Classification Using RapidMiner In this part, we will make use of Golf sample dataset provided

Nave Bayes and Decision tree Classification Part A: Naive Bayes Classification Using RapidMiner In this part, we will make use of Golf sample dataset provided with RapidMiner. You can drag and drop the dataset on the process window to use it. If you analyse the dataset then you will observe that the dataset has both continuous and discrete variables. For naive bayes, it is an important pre-processing step to convert continuous variables into discrete one. In this case, temperature and humidity need to be converted. RapidMiner provides us Numerical to Polynomial operator to convert continuous numerical variables into discrete polynomial ones. Please set the attribute filter type of subset, as we only want to discretise selected attributes. Next important step will be to set the role to identify the class attribute. In this case, it will be "Play" attribute. Split the data into training and testing data using split data operator. Use 70:30 ratio. Now make use of nave bayes model operator to build the model, Select the naive bayes model and place in process window. Use the apply model operator to apply Nave bayes model on the testing data. Use the performance operator to see the classifier performance. The final model will look like this: Change the split data ratio and see the results. Is there any effect of changing the testing and training data ratio on the output? Part 2: Decision Trees Classification Using RapidMiner In this part, we are going to use customer dataset to predict weather a customer is churn or not? CHURN MODELING Why do Telecommunication customers churn? Create a churn model based on observed past churn behavior: Train, optimize and evaluate a decision tree model using a balanced training data set. Step 1: Load a customer dataset (samples->Templates->Churn Modeling->Customer Data) that contains customer attributes like: - Age - Technology used (4G, fiber, etc.) - Date since he/she is a customer - Average bill last year - Number of support calls - Did he/she abandon last year? Step 2: Edit, transform & learn (ETL) and prepare data: Mark the target label column (i.e. the churn indicator) using Set Role operator and convert the numerical churn column to binary using Numerical to Binomial operator. Step 3: Model validation is key! This cross-validation operator splits the dataset for training and, then, for independent testing. This splitting is done several times to get a better performance estimate. Double-click on the operator to take a look at the training itself. Step 4: Many more customers stay than churn (hopefully!). In order for our model to learn how churners behave, we re-balance the data to focus on the case we're interested in. This is like a magnifying glass on churn! Take a look at the 'Sample' operator. Step 5: Let's now add a model trainer, like a Decision Tree. Try different values for the parameters, in particular, the 'minimal gain'. Step 6: The model trained on the training data is applied to the independent test data set and the model performance is calculated. Step 7: The performance values obtained on the different folds of the cross-validation are finally averaged to produce an average performance measure as well as a measure of its dispersion - which gives an estimate of the model stability when applied to different data samples. Outputs: - A tree model (trained on the complete input data) that analyzes churn behavior and can be applied to any individual customer to estimate churn probability. - The original input data - The estimated (i.e. cross-validated) performance of the model. Try and add multiple classification algorithms in one process and see the difference between them. You can use the vote operator to create ensemble classifier that uses a majority vote for classification.