Question

1 Approved Answer

Posted on Sep 26, 2024

R Programming Code: K-Means Algorithm Due Date: April 20, 2018 at 10:00pm central standard time zone Write a complete working R programming code using K-means

R Programming Code: K-Means Algorithm Due Date: April 20, 2018 at 10:00pm central standard time zone Write a complete working R programming code using K-means algorithm to do the following tasks. Use the data set file that is given with this project and name it. I have also attached my R code to this Project for help. I keep getting a few error messages. Please review the data file and Try to fix my code to make to run without errors. Please give an explanation to what you did to fix the program. Thanks for your help in advance. 1) Read data to Mdata to get pure data set only 2) Show the head of the Mdata 3) Exclude top value of B1 4) a all variables B1, B2,...B6 5) apply the same range of the variables. 6) show the results of the standardization. 7) With K-means method, analyze the data. ( K=3 case) 8) Show the results after dividing the data set. 9) find the number of size of the groups Data file to use: MyData2a X B1 B2 B3 B4 B5 B6 B7 1 A1 8.6 72.7 88 401 1162 3910 604 2 A2 10.1 28.4 112 408 1159 2304 267 3 A3 8.1 28.9 80 278 1030 2305 195 4 A4 9.3 43.0 169 437 1908 4337 419 5 A5 11.3 44.9 343 521 1696 3384 762 6 A6 7.0 42.3 145 329 1792 4231 486 7 A7 4.6 23.8 192 205 1198 2758 447 8 A8 31.0 52.4 754 668 1728 4131 975 9 A9 4.9 56.9 124 241 1042 3090 272 10 A10 11.7 52.7 367 605 2221 4373 598 11 A11 11.2 43.9 214 319 1453 2984 430 12 A12 4.8 31.0 106 103 1339 3759 328 13 A13 1.8 12.5 42 179 956 2801 158 14 A14 3.2 20.0 21 178 1003 2800 181 15 A15 8.9 32.4 325 434 1180 2938 628 16 A16 6.0 25.9 90 186 887 2333 328 17 A17 4.4 32.9 80 252 1188 3008 258 18 A18 6.7 23.1 83 222 824 1740 193 19 A19 12.8 40.1 224 482 1461 3417 442 20 A20 3.6 29.7 193 331 1071 2189 906 21 A21 9.0 43.6 304 476 1296 2978 545 22 A22 2.0 14.8 28 102 803 2347 164 23 A23 11.3 67.4 301 424 1509 3378 800 24 A24 2.5 31.8 102 148 1004 2785 288 25 A25 9.2 29.2 170 370 1136 2500 439 26 A26 11.2 25.8 65 172 1076 1845 150 27 A27 2.9 17.3 20 118 783 3314 215 28 A28 8.1 26.4 88 354 1225 2423 208 29 A29 1.0 11.6 7 32 385 2049 120 30 A30 3.1 24.6 51 184 748 2677 168 31 A31 2.2 21.5 24 92 755 2208 228 32 A32 5.2 33.2 269 265 1071 2822 776 33 A33 11.5 46.9 130 538 1845 3712 343 34 A34 12.6 64.9 287 354 1604 3489 478 35 A35 10.7 30.5 514 431 1221 2924 637 36 A36 5.5 38.6 142 235 988 2574 376 37 A37 8.1 36.4 107 285 1787 3142 649 38 A38 6.6 51.1 206 286 1967 4163 402 39 A39 5.5 25.1 152 176 735 1654 354 40 A40 3.5 21.4 119 192 1294 2568 705 41 A41 8.6 41.3 99 525 1340 2846 277 42 A42 4.0 17.7 16 87 554 1939 99 43 A43 10.4 47.0 208 274 1325 2126 544 44 A44 13.5 51.6 240 354 2049 3987 714 45 A45 3.2 25.3 59 180 915 4074 223 46 A46 7.1 26.5 106 167 813 2522 219 47 A47 2.0 21.8 22 103 949 2697 181 48 A48 5.0 53.4 135 244 1861 4267 315 49 A49 3.1 20.1 73 162 783 2802 254 50 A50 5.9 18.9 41 99 625 1358 169 51 A51 5.3 21.9 22 243 817 3078 169 My Code and error messages: > ## import the MyData2a set > MyData2a <- read.csv(file.choose(), header = TRUE) > # Display the pure data > MyData2a X B1 B2 B3 B4 B5 B6 B7 1 A1 8.6 72.7 88 401 1162 3910 604 2 A2 10.1 28.4 112 408 1159 2304 267 3 A3 8.1 28.9 80 278 1030 2305 195 4 A4 9.3 43.0 169 437 1908 4337 419 5 A5 11.3 44.9 343 521 1696 3384 762 6 A6 7.0 42.3 145 329 1792 4231 486 7 A7 4.6 23.8 192 205 1198 2758 447 8 A8 31.0 52.4 754 668 1728 4131 975 9 A9 4.9 56.9 124 241 1042 3090 272 10 A10 11.7 52.7 367 605 2221 4373 598 11 A11 11.2 43.9 214 319 1453 2984 430 12 A12 4.8 31.0 106 103 1339 3759 328 13 A13 1.8 12.5 42 179 956 2801 158 14 A14 3.2 20.0 21 178 1003 2800 181 15 A15 8.9 32.4 325 434 1180 2938 628 16 A16 6.0 25.9 90 186 887 2333 328 17 A17 4.4 32.9 80 252 1188 3008 258 18 A18 6.7 23.1 83 222 824 1740 193 19 A19 12.8 40.1 224 482 1461 3417 442 20 A20 3.6 29.7 193 331 1071 2189 906 21 A21 9.0 43.6 304 476 1296 2978 545 22 A22 2.0 14.8 28 102 803 2347 164 23 A23 11.3 67.4 301 424 1509 3378 800 24 A24 2.5 31.8 102 148 1004 2785 288 25 A25 9.2 29.2 170 370 1136 2500 439 26 A26 11.2 25.8 65 172 1076 1845 150 27 A27 2.9 17.3 20 118 783 3314 215 28 A28 8.1 26.4 88 354 1225 2423 208 29 A29 1.0 11.6 7 32 385 2049 120 30 A30 3.1 24.6 51 184 748 2677 168 31 A31 2.2 21.5 24 92 755 2208 228 32 A32 5.2 33.2 269 265 1071 2822 776 33 A33 11.5 46.9 130 538 1845 3712 343 34 A34 12.6 64.9 287 354 1604 3489 478 35 A35 10.7 30.5 514 431 1221 2924 637 36 A36 5.5 38.6 142 235 988 2574 376 37 A37 8.1 36.4 107 285 1787 3142 649 38 A38 6.6 51.1 206 286 1967 4163 402 39 A39 5.5 25.1 152 176 735 1654 354 40 A40 3.5 21.4 119 192 1294 2568 705 41 A41 8.6 41.3 99 525 1340 2846 277 42 A42 4.0 17.7 16 87 554 1939 99 43 A43 10.4 47.0 208 274 1325 2126 544 44 A44 13.5 51.6 240 354 2049 3987 714 45 A45 3.2 25.3 59 180 915 4074 223 46 A46 7.1 26.5 106 167 813 2522 219 47 A47 2.0 21.8 22 103 949 2697 181 48 A48 5.0 53.4 135 244 1861 4267 315 49 A49 3.1 20.1 73 162 783 2802 254 50 A50 5.9 18.9 41 99 625 1358 169 51 A51 5.3 21.9 22 243 817 3078 169 > # shows the heads > head(MyData2a ) X B1 B2 B3 B4 B5 B6 B7 1 A1 8.6 72.7 88 401 1162 3910 604 2 A2 10.1 28.4 112 408 1159 2304 267 3 A3 8.1 28.9 80 278 1030 2305 195 4 A4 9.3 43.0 169 437 1908 4337 419 5 A5 11.3 44.9 343 521 1696 3384 762 6 A6 7.0 42.3 145 329 1792 4231 486 > ##Verifies number of columns > ncol(MyData2a) [1] 8 > ##Verifies names of columns > names(MyData2a) [1] "X" "B1" "B2" "B3" "B4" "B5" "B6" "B7" > ## Excludes B1 column > MyData2a.new <- MyData2a[-2] > names(MyData2a.new) [1] "X" "B2" "B3" "B4" "B5" "B6" "B7" > ## New data excluding B1 > head(MyData2a.new ) X B2 B3 B4 B5 B6 B7 1 A1 72.7 88 401 1162 3910 604 2 A2 28.4 112 408 1159 2304 267 3 A3 28.9 80 278 1030 2305 195 4 A4 43.0 169 437 1908 4337 419 5 A5 44.9 343 521 1696 3384 762 6 A6 42.3 145 329 1792 4231 486 > #Exclude top value of B1 > MyData2a <- MyData2a [-1,] > # first row has been removed and head displays remaining rows > head(MyData2a) X B1 B2 B3 B4 B5 B6 B7 2 A2 10.1 28.4 112 408 1159 2304 267 3 A3 8.1 28.9 80 278 1030 2305 195 4 A4 9.3 43.0 169 437 1908 4337 419 5 A5 11.3 44.9 343 521 1696 3384 762 6 A6 7.0 42.3 145 329 1792 4231 486 7 A7 4.6 23.8 192 205 1198 2758 447 > ## assign all variables B1, B2,...B6 > ## assigns column name as B1 B2 and so on for all > ## the columns avaialble in MyData2a > colnames(MyData2a) <- c("B1", "B2","B3","B4","B5", "B6") > ## standardization > standardized <- scale(x, scale=TRUE) Error in scale(x, scale = TRUE) : object 'x' not found Note: Im note sure why Im getting this Error When I trace back I get this: > traceback() 1: scale(x, scale = TRUE) > standardized Error: object 'standardized' not found > kmeans.MyData2a <- kmeans(standardized, 3) Error in as.matrix(x) : object 'standardized' not found > kmeans.data$size Error: object 'kmeans.data' not found >