Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 21, 2024

Kindly help me find the solutions to the following questionsjQuery22405814921725860147_1621341573285? Please ? The dataset given is about the Health and economic conditions in different States

Kindly help me find the solutions to the following questionsjQuery22405814921725860147_1621341573285? Please ? The dataset given is about the Health and economic conditions in different States of a country. The Group States based on how similar their situation is, so as to provide these groups to the government so that appropriate measures can be taken to escalate their Health and Economic conditions.

1.1. Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, Data types, shape, EDA, etc, etc)

1.2. Do you think scaling is necessary for clustering in this case? Justify

1.3. Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using Dendrogram and briefly describe them.

1.4. Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow curve and find the silhouette score.

1.5. Describe cluster profiles for the clusters defined. Recommend different priority based actions that need to be taken for different clusters on the bases of their vulnerability situations according to their Economic and Health Conditions.

Data Dictionary for State_wise_Health_income:

1. States: names of States

2. Health_indeces1: A composite index rolls several related measures (indicators) into

a single score that provides a summary of how the health system is performing in the

State.

3. Health_indeces2: A composite index rolls several related measures (indicators) into a

single score that provides a summary of how the health system is performing in

certain areas of the States.

4. Per_capita_income-Per capita income (PCI) measures the average income earned per

person in a given area (city, region, country, etc.) in a specified year. It is calculated by

dividing the area's total income by its total population.

5. GDP: GDP provides an economic snapshot of a country/state, used to estimate the

size of an economy and growth rate.

Dataset for Problem 1: State_wise_Health_income.csv

Problem 2: CART-RF-ANN

Mortality Outcomes for Females Suffering Myocardial Infarction

The mifem data frame has 1295 rows and 10 columns. This is a Dataset of females having

coronary heart disease (CHD). you have to predict with the given information whether the female is dead or alive so as to discover important factors that should be considered crucial in the treatment of the disease. Use CART, RF & ANN, and compare the models' performances in train and test sets.

2.1. Data Ingestion: Read the dataset. Do the descriptive statistics and do null value condition

check, write inference on it.

2.2. Encode the data (having string values) for Modelling. Data Split: Split the data into test

and train, build classification model CART, Random Forest, Artificial Neural Network.

2.3 Performance Metrics: Check the performance of Predictions on Train and Test sets using

Accuracy, Confusion Matrix, Plot ROC curve, and get ROC_AUC score for each model.

2.4 Final Model: Compare all the models and write inference which model is

best/optimized.

2.5 Inference: Basis on these predictions, what are the insights and recommendations?

Dataset for Problem 2: mifem.csv

Data Dictionary for mifem.csv :

1. Outcome: mortality outcome: a factor with levels live, dead

2. Age: age at onset

3. Yronset: year of onset (The year of onset is the year on which an individual acquires,

develops, or first experiences a condition or symptoms of a disease or disorder)

4. Premi: previous myocardial infarction event, a factor with levels y, n, nk not known

5. Smstat: smoking status, a factor with levels c current, x ex-smoker, n non-smoker, nk not

known

6. Diabetes: a factor with levels y, n, nk not known

7. Highbp: high blood pressure, a factor with levels y, n, nk not known

8. Hichol: high cholesterol, a factor with levels y, n for yes and no

9. Angina: a factor with levels y, n, nk not known

10. Stroke: a factor with levels y, n, nk not known

Part 2: Calculate the following values: SSx = 55 SSy = 100 SP = 140 a = SP/SSx = 140/82.5 =1.69697 b = My - bM. = 10_(1.7*5.5) = 0.66667 Y =1.69697 +0.66667 r = 0.9949 Part 3: Confirm JAMOVI's conclusions by conducting the ANOVA for regression manually. STEP 1: STEP 2: STEP 3 (Fill in the blanks in following table) SOURCE SS df MS F REGRESSION RESIDUAL TOTAL STEP 4: STEP 5:Consider the dataset BodyFat. Computer output is shown for using this sample to create a multiple regression model to predict percent body fat using the other nine variables. Click here for the dataset associated with this question. The regression equation is Bodyfat = - 23.7 + 0.0838 Age - 0.0833 Weight + 0.036 Height + 0.001 Neck - 0.139 Chest + 1.03 Abdomen + 0.226 + 0.148 Biceps - 2.20 Wrist Predictor Conf SE Coof F Constant -23.66 29.46 -0.80 0.424 Age 0.08378 0.05066 1.65 0.102 Weight -0.08332 0.08471 -0.98 0.328 Height 0.0359 0.2658 0.14 0.893 Neck 0.0011 0.3801 0.00 0.998 Chost -0.1387 0.1609 -0.86 0.391 Abdomen 1.0327 0.1459 708 0.000 Ankle 0.2259 0.5417 0,42 0.678 Biceps 0.1483 0.2295 0.65 0.520 Wrist -2.2034 0.8129 -2.71 0.008 5 = 4.13652 R-Sq = 76.7% R-Sqladj) = 73.3%% Analysis of Variance Source 55 MS Regression 4807.36 534.15 31.23 0.000 II 4:50 PM 10/2/2019Suppose you have the following ANOVA tables for two different models. Partial ANOVA table for Complete Model: Source SS Regression 55.55 Error 16 12.20 Total 21 67.75 Partial ANOVA table for Reduced Model: Source SS Regression 51.28 Error 31 19.50 Total 34 70.78 At alpha = 0.05, perform a test for the complete vs reduced model. . What is the test statistic for this test? [ Select ] . What is the critical/rejection region for this test? [ Select ] . What is your conclusion? [Select ]Select one: O a ANOVA Table Source DF 55 MS F Regression 4 250.8 62.7 90.34 Error 19 13.2 0.694 Total 23 264 O b ANOVA Table Source DF SS MS F Regression 4 2508 62.7 85.42 Error 18 13.2 0.73 Total 22 264 O C ANOVA Table Source DF 55 MS F Regression 221.2 55.3 18.87 Error 18 52.8 2.93 Total 22 264What is the F-statistic for the F-test for joint significance for the variables that are omitted between Regression 1 and Regression 2? Regression 1: ANOVA df SS MS Significance F Regression 23049042.75 4209808.551 138.1853997 0.0072419 Residual 44 1340456.927 30464.93015 Total 19 24389499.68 Coefficients Standard Error t Stat P-value Intercept 174.2221778 64.19725548 0.009460073 text ban -157.1039676 57.44325682 0.008958599 total miles driven 0.014186532 0.002248056 1.17866E-07 urban_percent -0.734362208 1.416493192 0.60675107 cell_subscription -0.031981971 0.020500593 0.125912392 cell ban -101.5725233 74.98663757 0.182478815 Regression 2: ANOVA df SS MS F Significance F Regression N 20795181.34 10397590.67 306.5176817 1.08261E-27 Residual 47 1594318.34 33921.66681 Total 19 22389499.68