Question: What you need: The R software package, the file assignmentl.zip from the Moodle site. Task1 Preface: Banks are often posed with a problem to whether

What you need: The R software package, the file assignmentl.zip from the Moodle site. Task1 Preface: Banks are often posed with a problem

What you need: The R software package, the file assignmentl.zip from the Moodle site. Task1 Preface: Banks are often posed with a problem to whether or not a client is credit worthy. Banks commonly employ data mining techniques to classify a customer into risk categories such as category A (highest rating) or category C (lowest rating). A bank collects data from past credit assessments. The file creditworthiness.csv contains 2500 of such assessments. Each assessment lists 46 attributes of a customer. The last attribute the 47-th attribute) is the result of the assessment. Open the file and study its contents. You will notice that the columns are coded by numeric values. The meaning of these values is defined in the file definitions.txt. For example, a value 3 in the 47-th column means that the customer credit worthiness is rated "C". Any value of attributes not listed in definitions.txt is "as is". This poses a "prediction" problem. A machine is to learn from the outcomes of past assessments and, once the machine has been trained to assess any customer who has not yet been assessed. For example, the value O in column 47 indicates that this customer has not yet been assessed. Purpose of this task: You are to start with an analysis of the general properties of this dataset by using suitable visualization and clustering techniques (i.e. Such as those introduced during the lectures), and you are to obtain an insight into the degree of difficulty of this prediction task. Then you are to design and deploy an appropriate supervised prediction model (i.e. MLP) to obtain a prediction of customer ratings. Question 1: (5 marks) Analyse the general properties of the dataset and obtain an insight into the difficulty of the prediction task. Create a statistical analysis of the attributes and their values, then list 5 of the most interesting (most valuable) attributes. Explain the reasons that make these attributes interesting. Note: A set of R-script files are provided with this assignment (included in the assignmentl.zip file). The scripts provided will allow you to produce some first results. However, virtually none of the parameters used in these scripts are suitable for obtaining a good insight into the general properties of the given dataset. Hence your task is to modify the scripts such that informative results can be obtained from which conclusions about the learning problem can be made. Note that finding a good set of parameters is often very time consuming in data mining. An additional challange is to make a correct interpretation of the results. This is what you need to do: Find a good set of parameters (1.e. Through a trial and error approach), obtain informative results then offer an interpretation of the results. Write down your approach to conducting the experiments, explain your results, and offer a comprehensive interpretation of the results. Do not forget that you are also to provide an insight into the degree of difficulty of this learning problem (1.e. From the results that you obtained, can it be expected that a prediction model will be able to obtain 100% prediction accuracy?). Always explain your answers. Question 2: (7 marks) Deploy a prediction model to predict the credit worthiness of customers which have not yet been assessed. The prediction capabilities of the MLP in the lab of "Classification was very poor. Your task is to: a.) Describe a valid strategy that maximises the accuracy of predicting the credit rating. Explain why your strategy can be expected to maximise the prediction capabilities. b.) Use your strategy to train MLP(s) then report your results. Give an interpretation of your results. What is the best classification accuracy (expressed in % of correctly classified data) that you can obtain for data that were not used during training (i.e. The test set)? What you need: The R software package (Rstudio is optional) and the file assignmentl.zip. Successful completion of the lab of "Classification. You may use the R-script of the lab of "Classification as a basis for attempting this question. Note that in this assignment the term "prediction capabilities" refer to a model's ability to predict the credit rating of samples that were not used to train the model (i.e. samples in a test set). The answers to this assignment should be provided with a single PDF document which is to be submitted. Submit one single PDF document that contains your answers to this assignment. Submit before the due date and follow the submission procedure as described in the header of this assignment. What you need: The R software package, the file assignmentl.zip from the Moodle site. Task1 Preface: Banks are often posed with a problem to whether or not a client is credit worthy. Banks commonly employ data mining techniques to classify a customer into risk categories such as category A (highest rating) or category C (lowest rating). A bank collects data from past credit assessments. The file creditworthiness.csv contains 2500 of such assessments. Each assessment lists 46 attributes of a customer. The last attribute the 47-th attribute) is the result of the assessment. Open the file and study its contents. You will notice that the columns are coded by numeric values. The meaning of these values is defined in the file definitions.txt. For example, a value 3 in the 47-th column means that the customer credit worthiness is rated "C". Any value of attributes not listed in definitions.txt is "as is". This poses a "prediction" problem. A machine is to learn from the outcomes of past assessments and, once the machine has been trained to assess any customer who has not yet been assessed. For example, the value O in column 47 indicates that this customer has not yet been assessed. Purpose of this task: You are to start with an analysis of the general properties of this dataset by using suitable visualization and clustering techniques (i.e. Such as those introduced during the lectures), and you are to obtain an insight into the degree of difficulty of this prediction task. Then you are to design and deploy an appropriate supervised prediction model (i.e. MLP) to obtain a prediction of customer ratings. Question 1: (5 marks) Analyse the general properties of the dataset and obtain an insight into the difficulty of the prediction task. Create a statistical analysis of the attributes and their values, then list 5 of the most interesting (most valuable) attributes. Explain the reasons that make these attributes interesting. Note: A set of R-script files are provided with this assignment (included in the assignmentl.zip file). The scripts provided will allow you to produce some first results. However, virtually none of the parameters used in these scripts are suitable for obtaining a good insight into the general properties of the given dataset. Hence your task is to modify the scripts such that informative results can be obtained from which conclusions about the learning problem can be made. Note that finding a good set of parameters is often very time consuming in data mining. An additional challange is to make a correct interpretation of the results. This is what you need to do: Find a good set of parameters (1.e. Through a trial and error approach), obtain informative results then offer an interpretation of the results. Write down your approach to conducting the experiments, explain your results, and offer a comprehensive interpretation of the results. Do not forget that you are also to provide an insight into the degree of difficulty of this learning problem (1.e. From the results that you obtained, can it be expected that a prediction model will be able to obtain 100% prediction accuracy?). Always explain your answers. Question 2: (7 marks) Deploy a prediction model to predict the credit worthiness of customers which have not yet been assessed. The prediction capabilities of the MLP in the lab of "Classification was very poor. Your task is to: a.) Describe a valid strategy that maximises the accuracy of predicting the credit rating. Explain why your strategy can be expected to maximise the prediction capabilities. b.) Use your strategy to train MLP(s) then report your results. Give an interpretation of your results. What is the best classification accuracy (expressed in % of correctly classified data) that you can obtain for data that were not used during training (i.e. The test set)? What you need: The R software package (Rstudio is optional) and the file assignmentl.zip. Successful completion of the lab of "Classification. You may use the R-script of the lab of "Classification as a basis for attempting this question. Note that in this assignment the term "prediction capabilities" refer to a model's ability to predict the credit rating of samples that were not used to train the model (i.e. samples in a test set). The answers to this assignment should be provided with a single PDF document which is to be submitted. Submit one single PDF document that contains your answers to this assignment. Submit before the due date and follow the submission procedure as described in the header of this assignment

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Task Preface: Banks are often posed with a problem to whether or not a client is credit worthy. Banks commonly employ data mining techniques to classify a customer into risk categories such as...

i want complete solution for my assignment and it should be without plagiarism COIT20274: Information Systems for Business Professionals, Term One 2016 Assignments 1 & 2 Requirements Assignment 1 -...

this assignment is regarding return the tax of a client by using handy taxassignment. can anyone help me to complete the income section of this assignment, just write the solution in a pdf file?I...

Processing steps for 18 questions are required. Thanks so much for help! Queensland University of Technology QUT Business School School of Accountancy AYB 219 Taxation Law HandiTax Group Project...

Project Management Casebook David I. Cleland, Karen M. Bursic, Richard Puerzer, and A. Yaroslav Vlasak Library of Congress Cataloging-in-PublicationData Project management casebook /edited by David...

There are two problems due this week (each worth 35 points) as follows. Case 5-1David L. Miller: Portrait of a White-Collar Criminal (page 144). In comprehensive paragraphs, answerrequirements 1?6....

Rev.Confirming Pages C H A P T E R 7 Planning, Composing, and Revising Chapter Outline The Ways Good Writers Write Activities in the Composing Process Using Your Time Effectively Brainstorming,...

See the Take-Two Interactive Software, Inc. case for these questions. 1- Analyze Take-Two's 1998-2000 financial data included in Exhibit 1. Compute the following financial ratios for each of those...

What is the frequency of radiation that has a wavelength of 12 um , about the size of a bacterium? Express in 2 sigfig. Its like they want this converted from v to s-1. Not sure what the question is...

Create a webpage A2-Task2.html. The webpage is displaying one image. Each time the user moves the mouse out of the image, the image is changed to another one and one more 'sun' emoji is displayed...

When comparing a security ' s intrinsic value to its market price, which of the following statements is correct? A security should always be purchased when its market price is above its intrinsic...

CT Corp Comprehensive Question Canadian Tire Corporation, Limited ( Canadian Tire ) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

outline some of the current issues facing HR managers

demonstrate how human resource management can make a difference by adding value to an organisation

outline the range of activities with which practitioners of human resource management are likely to be involved