Please answer (Attached image) question 2 from the right side. Answer them using R Studio or RapidMiner Studio.
Churn data: https://github.com/albayraktaroglu/Datasets/blob/master/churn.csv
Data Preparation and Exploration A telephone company is interested in determining which customer characteristics are useful for predicting churn, customers who will leave their service. Your task is to uncover patterns in the 2. Examine the variables graphically. customer data that will help the company identify which types of customers are most (least) likely to churn. a. For two interesting categorical variables, construct a distribution of the variable. Comment on each. The data set used for this project is Churn, is attached. The fields are as follows: Examine the distribution of all numeric variables, using histograms. (Need not include in report.) Make a little table listing (in alphabetic order) the variables which are not normally distributed, State String, categorical along with the transformation function needed to induce normality (e.g. log). For those account length Numeric, integer variables, perform the transformation to induce approximate normality. Provide before/after area code Numeric, integer histograms for all such variables. phone number String international plan String, categorical 3. Examine the variables statistically. voice mail plan String, categorical number vmail messages Numeric, integer a. For all the numeric variables, find the mean, median, standard deviation, min and max. Put the total day minutes Numeric, continuous results in a table, with the variables in alphabetical order. total day calls Numeric, integer b. Normalize all the numeric variables, using either (i) z-scores, or, (ii) min-max normalization [ total day charge Numeric, continuous (value-min)/range]. total eve minutes Numeric, continuous total eve calls Numeric, integer 4. Relationships between variables. total eve charge Numeric, continuous total night minutes Numeric, continuous a. Plot Day Mins vs. Day Charge. Comment. How shall we deal with this? total night calls Numeric, integer b. Construct a scatter plot between any two numeric variables that you find interesting (not those total night charge Numeric, continuous in (a)). total intl minutes Numeric, continuous c. Using the statistics node, report any high correlations between any two variables. What would total intl calls Numeric, integer be the effect of keeping two highly correlated variables in the model? What should be done? total intl charge Numeric, continuous 5. Data Manipulation number customer service calls Numeric, integer Churn String . Apart from churnerson-churners, are there any interesting subsets of records to be selected for special attention? Selecting and analyzing them may increase the precision of your analysis For the project, write up a report, Including an Executive Summary (at the beginning) with your most for important subsets of customers. Why do you find this subset of the data interesting and salient findings (supporting evidence) and a list of recommendations for the company. The report useful? Provide graphics and descriptive statistics that describe the behavior of your subset. explains all steps and results clearly and cogently, in a MS Word document (Questions #1 to #10), so b. Discretize (make categorical) a relevant numeric variable which you think will be explicatory of that a reasonably intelligent, but statistically naive manager could understand it. You need to include all churn. This can be done using histograms. the graphics in your report and also indicate which software are used for the analysis, e.g., R Studio, RapidMiner Studio, etc. Your narrative should be clear and concise, accompanied by supporting 6. With a view to uncovering customer churn patterns, investigate how each relevant variable is evidence in the form of graphics and tables. associated with Churn