Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Lab 3 - Graphical and Numerical Summaries Lectures Covered: 5-7 Tutorial Before we start, you will want to make sure that the ggplot2 package is

Lab 3 - Graphical and Numerical Summaries Lectures Covered: 5-7 Tutorial Before we start, you will want to make sure that the ggplot2 package is downloaded and ready to go. library(ggplot2) To start today, we will generate a set of data to use for analysis. set.seed(235) data1 = factor(sample(1:5, 50, replace = TRUE)) data1 = as.data.frame(data1) colnames(data1) = c('Group') head(data1) ## ## ## ## ## ## ## 1 2 3 4 5 6 Group 5 4 4 1 2 3 1 STAT 230 2 LAB 3 Now that we have our data stored in data1, we can analyze it graphically. ggplot(data = data1, aes(x = Group)) + geom_bar(binwidth = 1, position = "dodge" ) + Remove this ggtitle("Box Chart of Groups") Box Chart of Groups count 10 5 0 1 2 3 4 5 Group Note: Colours make everything better. Click here for a complete list of R colours. 2 STAT 230 LAB 3 3 In order to analyze the data numerically we will use the R command table(). To do this you specify the data and column data1$Group inside the command. table(data1$Group) ## ## 1 2 ## 13 11 3 6 4 5 7 13 The table() command returns a count of now many observations are in each of the groups. Moving onto continuous data, we start again by fabricating some fake data. x1 = rnorm(100, 25, 3) x2 = rnorm(100, 20, 4) group = c(rep("Group 1", 100), rep("Group 2", 100)) data2 = data.frame( x = c(x1, x2), group = group, Value = "Value") head(data2) ## ## ## ## ## ## ## 1 2 3 4 5 6 x 24.4 26.4 24.9 25.1 29.9 23.0 group Group 1 Group 1 Group 1 Group 1 Group 1 Group 1 Value Value Value Value Value Value Value Histograms can also be created using the ggplot() command: ggplot(data2, aes(x = x)) + geom_histogram(binwidth = 2, lwd = 1.5) + labs(x = "Observed Data") + ggtitle("Histogram of Sample Data") 3 STAT 230 4 LAB 3 Histogram of Sample Data 40 count 30 20 10 0 10 15 20 25 30 35 Observed Data The next type of graphical summary we will cover is how to make a boxplot. ggplot(data = data2, aes(x = Value, y = x)) + geom_boxplot() + ggtitle("Boxplot of the Data") 4 STAT 230 LAB 3 5 Boxplot of the Data 30 x 25 20 15 Value Value We are also able to do side by side boxplots on the same set of axes by modifying the parameter x in the aes() portion. ggplot(data = data2, aes(x = group, y = x)) + geom_boxplot() + ggtitle("Boxplot of the Data") 5 STAT 230 6 LAB 3 Boxplot of the Data 30 x 25 20 15 Group 1 Group 2 group The nal graphical representation we will cover in the lab is the empirical CDF for a dataset using qplot(). qplot() can be joined with the labs() modier and/or the ggtitle() modier. Replace this with: plot(ecdf(data2$x),verticals=TRUE,do.p=FALSE) qplot(data2$x, stat = "ecdf", geom = "step") + labs(x = "data") 6 STAT 230 7 LAB 3 1.00 y 0.75 0.50 0.25 0.00 10 15 20 25 30 data Numerical summaries of continuous data is easy with R . There are functions for many of the functions we are interested in. mean(data2$x) ## [1] 22.7 var(data2$x) 7 STAT 230 8 LAB 3 ## [1] 18.3 range(data2$x) ## [1] 12.3 30.8 median(data2$x) ## [1] 23.3 Q1 = quantile(data2$x, 0.25) Q1 ## 25% ## 19.3 Q3 = quantile(data2$x, 0.75) Q3 ## 75% ## 26 IQR = Q3 - Q1 IQR ## 75% ## 6.67 This seems to be quite tedious, calculating all of those numbers individually. The easiest way to get all of this information is to use the summary() command. It is the R equivalent of Nic Cage's acting skills, it will do almost anything. The only extra call you might need to make is var(). summary(data2$x) ## ## Min. 1st Qu. 12.3 19.3 Median 23.3 Mean 3rd Qu. 22.7 26.0 var(data2$x) ## [1] 18.3 8 Max. 30.8 STAT 230 LAB 3 9 Exercises Set your seed to 3(lab number) 1. Read in the MLS dataset. Create a summary table and bar chart to compare the number of players for each team. 2. Which of the methods in Question 1 would you choose? Why? 3. Provide numerical summary of TotalSalary for the Vancouver players. 4. Choose a graphical representation to show if the Vancouver data is skewed or symmetric. Which numerical summary would you use for this case? Hint: Use the option binwidth = 100000 5. Load in the muscle velocity data and create a boxplot of the the post stretch velocity for tendons. Hint: The code muscle$Time = as.factor(muscle$Time) will make the variable Time of the muscle dataset into a categorical or qualitative variable. 6. Create a side by side boxplot for pre vs post Fibre Velocity. 7. Create a numerical summary for both time groups of Fiber Velocity. Using these and the graph from above, if we were to test if there is a dierence between the groups what would you expect for a nal result? 8. Create an ECDF for the post Fiber Velocity. 9 STAT 230 Group Project 1 Data Description Your dataset contains information reguarding carseat sales from dierent stores. You are interested in the average price for a carseat. The following table contains descriptions of your individual variables. Variable Price Urban Ave_Age Pop Description Price charged for a carseat for each location. Variable of interest. Whether or not the location is urban or rural. Factor with levels No and Yes. Average age of the local population. Population size (in thousands of people) in the region. 1 Price Ave_Age Urban Pop 120 42 Yes 276 83 65 Yes 260 80 59 Yes 269 97 55 Yes 466 128 38 Yes 340 72 78 No 501 108 71 Yes 45 120 67 Yes 425 124 76 No 108 124 76 No 131 100 26 No 150 94 50 Yes 503 136 62 Yes 393 86 53 Yes 29 118 52 Yes 148 144 76 No 400 NA 63 Yes 284 131 52 Yes 251 68 46 No 408 121 69 Yes 58 131 35 Yes 367 109 62 No 239 138 42 Yes 497 109 79 Yes 292 113 42 Yes 294 82 54 No 176 131 50 No 496 107 64 Yes 19 97 55 Yes 359 102 58 Yes 226 89 30 Yes 447 131 44 Yes 241 137 64 No 236 128 50 Yes 317 128 42 Yes 406 96 44 No 29 100 60 No 270 110 54 Yes 412 102 65 Yes 454 138 38 No 144 126 73 No 18 124 58 Yes 403 24 50 Yes 25 134 59 Yes 16 95 69 Yes 325 135 44 Yes 168 70 48 No 16 108 55 Yes 173 98 69 Yes 349 149 32 Yes 51 108 80 Yes 341 108 75 Yes 150 129 39 Yes 112 119 61 Yes 39 144 76 No 25 154 61 Yes 60 84 50 Yes 54 117 75 Yes 22 103 74 Yes 188 114 80 Yes 148 123 29 Yes 469 107 26 No 358 133 77 Yes 146 101 61 Yes 170 104 32 No 184 128 55 No 197 91 56 Yes 508 115 47 Yes 152 134 60 Yes 366 99 65 Yes 339 99 74 Yes 237 150 58 No 148 116 25 Yes 432 104 31 No 54 136 64 No 125 92 36 No 480 70 64 Yes 346 89 67 No 44 145 65 Yes 139 90 41 Yes 286 79 68 Yes 353 128 70 Yes 237 139 28 Yes 325 94 56 Yes 468 121 43 No 52 112 49 No 304 134 64 Yes 432 126 54 No 272 111 62 Yes 144 119 45 No 493 103 64 No 491 107 80 Yes 267 125 29 Yes 97 104 55 Yes 67 84 148 132 129 127 107 106 118 97 96 138 97 139 108 103 90 116 151 125 127 106 129 128 119 99 128 131 87 108 155 120 49 133 116 126 147 77 94 136 97 131 120 120 118 109 94 129 131 104 159 123 117 131 119 97 87 114 103 128 150 110 69 157 90 112 70 111 160 149 106 141 191 137 93 117 77 118 55 110 128 185 122 154 94 81 116 149 91 140 102 97 107 86 96 90 55 Yes 59 Yes 73 No 33 Yes 36 No 56 No 76 No 34 Yes 65 No 78 Yes 51 Yes 61 Yes 70 No 60 Yes 65 Yes 60 No 43 Yes 43 Yes 62 Yes 33 Yes 65 Yes 42 Yes 80 No 41 Yes 62 Yes 64 Yes 63 Yes 28 Yes 75 Yes 29 No 63 Yes 43 No 59 Yes 51 Yes 55 Yes 40 No 51 Yes 77 Yes 72 Yes 62 Yes 76 Yes 36 No 31 No 80 Yes 44 Yes 30 No 45 Yes 28 Yes 77 Yes 28 Yes 34 No 47 Yes 39 Yes 41 No 72 No 56 Yes 57 No 75 No 45 No 25 No 50 No 65 Yes 51 Yes 48 No 39 No 30 No 29 No 67 No 51 Yes 39 No 27 No 27 Yes 55 Yes 60 Yes 45 Yes 73 Yes 71 Yes 75 Yes 35 Yes 66 Yes 79 No 25 Yes 47 No 27 Yes 25 No 77 Yes 66 Yes 68 Yes 25 Yes 80 Yes 60 No 64 Yes 46 No 62 Yes 76 Yes 134 237 407 287 382 220 94 89 57 334 472 398 217 104 488 217 125 272 298 335 17 95 202 507 243 137 249 380 45 125 181 181 60 192 350 279 497 208 232 265 327 384 10 436 371 310 277 331 300 36 264 27 412 402 384 140 176 407 341 488 289 59 220 249 189 372 486 81 424 40 58 100 151 216 425 492 356 416 123 207 358 38 480 148 89 70 434 79 230 426 35 449 93 142 426 104 101 173 93 96 128 112 133 138 128 126 146 134 130 157 124 132 160 97 64 90 123 120 105 139 107 144 144 111 120 116 124 107 145 125 141 82 NA 101 163 72 114 122 105 120 129 132 108 135 133 118 121 94 135 110 100 88 90 151 101 117 156 132 117 122 129 81 144 112 81 100 101 118 132 115 NA 129 112 112 105 166 89 110 63 86 119 132 130 125 151 158 145 105 154 117 26 No 37 No 74 Yes 56 No 61 Yes 45 Yes 66 Yes 72 Yes 76 Yes 69 Yes 64 Yes 62 No 54 Yes 46 No 25 Yes 37 Yes 28 Yes 77 Yes 61 No 33 Yes 76 No 47 No 32 Yes 45 Yes 33 Yes 73 Yes 71 Yes 34 Yes 70 No 25 Yes 58 Yes 30 Yes 80 Yes 35 Yes 62 Yes 48 No 36 Yes 56 Yes 57 Yes 26 No 27 No 38 No 27 No 61 Yes 29 No 56 No 50 Yes 69 Yes 48 Yes 73 Yes 62 Yes 26 Yes 38 Yes 52 No 57 Yes 26 Yes 57 No 78 Yes 34 Yes 61 Yes 65 Yes 72 Yes 62 Yes 32 Yes 57 No 37 Yes 80 Yes 73 Yes 80 Yes 72 No 74 No 36 Yes 54 Yes 48 Yes 25 Yes 31 Yes 39 Yes 73 No 51 No 39 Yes 46 Yes 26 Yes 62 Yes 38 Yes 58 Yes 34 Yes 53 Yes 73 Yes 36 Yes 40 No 64 Yes 51 Yes 45 No 61 Yes 80 No 509 297 170 408 71 481 420 410 333 500 335 349 139 413 132 237 317 27 466 497 326 357 445 501 220 48 170 243 481 156 359 262 125 178 276 464 412 245 68 381 404 119 123 24 218 289 95 361 499 200 149 362 160 199 87 391 199 266 298 12 86 435 310 70 288 353 198 277 477 251 467 400 188 86 434 324 402 343 473 66 438 284 504 14 244 67 210 296 326 129 376 496 303 80 112 96 131 113 72 97 156 103 89 74 89 99 137 123 104 130 96 99 87 110 99 134 132 133 120 126 80 166 132 135 54 129 171 72 136 130 129 152 98 139 103 150 104 122 104 111 89 112 134 104 147 83 110 143 102 101 126 91 93 118 121 126 149 125 112 107 NA 91 NA 122 92 145 146 NA 72 118 130 114 104 110 108 131 162 134 53 79 122 119 126 98 116 118 124 92 125 119 79 No 39 Yes 67 No 44 Yes 76 No 43 Yes 41 No 39 Yes 76 Yes 59 Yes 60 Yes 79 No 63 Yes 75 Yes 63 Yes 54 No 45 Yes 57 Yes 74 Yes 43 Yes 29 Yes 33 Yes 48 Yes 61 Yes 52 Yes 68 Yes 53 Yes 51 Yes 38 Yes 66 Yes 71 Yes 29 Yes 35 Yes 80 No 41 No 57 No 44 Yes 34 Yes 60 Yes 34 Yes 53 Yes 47 Yes 53 Yes 80 Yes 55 Yes 45 Yes 32 No 32 Yes 61 Yes 41 Yes 42 Yes 72 Yes 28 Yes 80 Yes 45 Yes 44 Yes 43 Yes 72 No 71 No 26 Yes 70 No 79 Yes 33 No 27 No 49 Yes 49 No 63 No 62 No 61 Yes 35 No 42 Yes 42 Yes 72 Yes 34 Yes 69 Yes 66 Yes 52 No 56 No 79 Yes 25 Yes 30 Yes 57 No 53 No 52 Yes 74 No 36 Yes 42 Yes 54 Yes 65 No 78 Yes 47 Yes 73 Yes 44 Yes 41 No 79 Yes 414 261 429 208 74 448 400 106 322 74 126 502 160 276 312 497 158 198 388 290 408 394 85 13 436 33 419 328 337 491 333 220 369 472 456 459 171 499 300 428 133 131 152 316 65 433 501 213 354 303 489 464 60 283 164 219 105 268 422 371 108 279 144 161 459 467 266 458 288 430 80 306 111 276 71 396 265 183 26 377 488 122 447 256 348 463 403 191 508 402 90 206 319 263 105 107 89 151 121 68 112 132 160 115 78 107 111 124 130 120 139 128 120 159 95 120 54 Yes 68 Yes 77 Yes 66 Yes 63 Yes 28 Yes 62 Yes 39 Yes 73 No 79 Yes 35 Yes 67 Yes 56 Yes 34 Yes 30 No 33 Yes 33 Yes 55 No 40 Yes 50 Yes 49 Yes 404 17 496 315 76 348 455 170 238 245 328 61 49 315 26 366 203 37 368 284 27

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

High School Math 2012 Common-core Algebra 2 Grade 10/11

Authors: Savvas Learning Co

Student Edition

9780133186024, 0133186024

More Books

Students also viewed these Mathematics questions