Question

1 Approved Answer

Posted on Sep 05, 2024

This is a multi-classification task where you are expected to predict the response variable. Given the attributes or features of a client, the task of

This is a multi-classification task where you are expected to predict the response variable.

Given the attributes or features of a client, the task of the predictive system is to measure the level of risk in providing insurance to the client. Risk is categorized into 8 levels.

In this project, you meet a challenge to predict response variable for a client given the history of the client (training.csv). Note, that some of the information for the client is not available (missing).

You will be provided with a file (testing.csv) for predicting a response variable for a client.

training.csv: It is a comma-separated training dataset file that contains attributes of a client and the ground-truth response variable (label).

testing.csv: It is a comma-separated testing dataset file that contains attributes of unseen clients for which the response variable has to be predicted.

sample_solution.csv: It is a comma-separated sample solution file that contains the ids of the clients in test dataset and their predicted response variable by your algorithm.

For every test instance in the testing.csv submission files should contain two columns: Id and Response. Details are available on Kaggle website.

you are allowed to use any library and packages. It is required to implement at least one algorithm (PCA preffered.)

DATASET:

In this dataset, you are provided with over a hundred variables describing attributes of life insurance applicants. The task is to predict the "Response" variable for each Id in the test set. "Response" is an ordinal measure of risk that has 8 levels.

File descriptions

training.csv - the training set, contains the Response values

testing.csv - the test set, you must predict the Response variable for all rows in this file

sample_submission.csv - a sample submission file in the correct format

Data fields

Variable Description

Id A unique identifier associated with an

application. Product_Info_1-7 A set of normalized variables relating

to the product applied for Ins_Age Normalized age of applicant

Ht Normalized height of applicant Wt Normalized weight of applicant

BMI Normalized BMI of applicant Employment_Info_1-6 A set of

normalized variables relating to the employment history of the

applicant. InsuredInfo_1-6 A set of normalized variables providing

information about the applicant. Insurance_History_1-9 A set of

normalized variables relating to the insurance history of the

applicant. Family_Hist_1-5 A set of normalized variables relating to

the family history of the applicant. Medical_History_1-41 A set of

normalized variables relating to the medical history of the applicant.

Medical_Keyword_1-48 A set of dummy variables relating to the presence

of/absence of a medical keyword being associated with the application.

Response This is the target variable, an ordinal

variable relating to the final decision associated with an application

The following variables are all categorical (nominal):

Product_Info_1, Product_Info_2, Product_Info_3, Product_Info_5, Product_Info_6, Product_Info_7, Employment_Info_2, Employment_Info_3, Employment_Info_5, InsuredInfo_1, InsuredInfo_2, InsuredInfo_3, InsuredInfo_4, InsuredInfo_5, InsuredInfo_6, InsuredInfo_7, Insurance_History_1, Insurance_History_2, Insurance_History_3, Insurance_History_4, Insurance_History_7, Insurance_History_8, Insurance_History_9, Family_Hist_1, Medical_History_2, Medical_History_3, Medical_History_4, Medical_History_5, Medical_History_6, Medical_History_7, Medical_History_8, Medical_History_9, Medical_History_11, Medical_History_12, Medical_History_13, Medical_History_14, Medical_History_16, Medical_History_17, Medical_History_18, Medical_History_19, Medical_History_20, Medical_History_21, Medical_History_22, Medical_History_23, Medical_History_25, Medical_History_26, Medical_History_27, Medical_History_28, Medical_History_29, Medical_History_30, Medical_History_31, Medical_History_33, Medical_History_34, Medical_History_35, Medical_History_36, Medical_History_37, Medical_History_38, Medical_History_39, Medical_History_40, Medical_History_41

The following variables are continuous:

Product_Info_4, Ins_Age, Ht, Wt, BMI, Employment_Info_1, Employment_Info_4, Employment_Info_6, Insurance_History_5, Family_Hist_2, Family_Hist_3, Family_Hist_4, Family_Hist_5

The following variables are discrete:

Medical_History_1, Medical_History_10, Medical_History_15, Medical_History_24, Medical_History_32

Medical_Keyword_1-48 are dummy variables.

I am not able to add csv files. Please let me know how to add it?