Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This assignment will give you a very brief introduction to R What is R? R is a software platform and computer programming language for dealing

This assignment will give you a very brief introduction to R

What is R? R is a software platform and computer programming language for dealing with data, statistical analyses, and visualizations. It does all the same things as Excel, but much, much more!

Why use R? It is free (!) and used by millions of people around the world. Other scientists are constantly creating amazing packages (add-ons) that allow you to do nearly anything you would ever want or need to do with data in R. I can't overstate how awesome it is that all of this power is available for FREE. There is also a huge community of people who use R, with millions of people who are constantly discussing and troubleshooting online. This makes it relatively easy to solve any problem you have, because any time you have an issue, someone else has already asked and answered it somewhere on the internet. This gives you the power to solve lots of problems with careful Googling and/or asking your peers. There are many free tutorials and books available online as well.

How does it work? R is based on the command line. This means you interact with R by typing, and then executing, lines of text or computer code. This can be challenging, because command line interfaces do not tolerate any typing mistakes. Any typing error (even a capital letter vs. lowercase letter mix-up) means that the line of code will NOT work.

This can be frustrating at first. But don't fear the command line! A major advantage of working this way is that all of your steps can be saved into a text file that is called a script. The script gets saved on your computer and can be annotated with other information about what you did and why (i.e., called commenting). The ability to easily incorporate comments makes your script much like an electronic lab notebook. This has many benefits. For one thing, it makes your work traceable and repeatable. If you make a mistake, you can easily go back and find it, fix it, and then re-run your code. Working with scripts also makes it easy to re-make a graph many times over. And it makes it easy for other people (including your future self) to follow along with and understand what you did, because they can re-run your code, and read all of your commentary. For all of these reasons, it much better to do all of your data processing, graphs, and analysis using scripting, rather than editing raw files directly.

Although you interact with R using the command line, R reads and writes data files using all of the same file formats that you can use in Excel. For example, .csv data files are very common and my personal favourite. There are also R packages that let you read and write .xlsx files, and many other file formats.

Do I have to install R for this assignment? No! You are not required install R for BIOL 1105. For the purpose of this assignment, we are going to use an online web version of R that runs in your browser, called Snippets:https://rdrr.io/snippets/

Snippets makes it easy to run examples without installing any software. However, it doesn't allow you to load your own files, nor does it allow you to save your work. We will use it just because it's an easy introduction.

This sounds great, how do I learn more? Snippets is good for this course, but you can't use it to work with your own data. For that, you need to install actual R software on your computer. You can install R from:https://cran.r-project.org/ Another very popular option is to also install R Studio as well as R.R Studio gives you an interface that many people find more friendly:https://rstudio.com/products/rstudio/download/ (although note that R Studio is still based on the command line). Learning to use R or R Studio is highly recommended for your other lab courses and/or Honours thesis. There are courses you can take at Carleton to learn more, such as BIOL 3604. And there are often workshops to learn more e.g.:https://library.carleton.ca/help/r-using-r

Ok, here is the actual Assignment...

The text belowwith blue shading is the code that you should copy and paste into Snippets, or into R. Then use Control-Enter or Command-Enter to execute the pasted commands. If you are using Snippets, you can replace the default lines that show up automatically whenever you first load the page. If you are using a version of R that you installed on your computer, you would need to also install the MASS package to be able to complete this assignment. The questions for the submitted part of the assignment are highlighted bold and yellow.

We will use the dataset on Cars from 1993, which we've seen before:

You can look at the data here:https://bit.ly/3hVwqrw

You can find the meta-data here:https://bit.ly/3luloeR

The Cars93 dataset is in the MASS package which you have to load separately in R. Here, we load the package using the "library" function. Then, we use the "print" function on the dataset named Cars93, so that we can view all of the data:

library(MASS)

print(Cars93)

Because there are so many columns, it's more manageable to inspect just the first and last few rows of the data. We can do that using the "head" and "tail" functions:

library(MASS)

head(Cars93)

The "head" function prints the first 6 rows of the data. If you replace "head" with "tail" in the code above, you'll print the last 6 rows instead.

Let's examine a categorical variable: whether or not a car had at least one airbag. The AirBags column in the dataset has three levels (None, Driver only, or Driver & Passenger). But we want to analyze this as a binary variable instead, with only TWO levels describing whether or not a car has any airbags. So let's make a new column labelled "Airbags_bin". The code below creates the new column "Airbags_bin" as a binary variable, and prints it for viewing. Notice how this new variable has only two levels.

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

print(Cars93$AirBags_bin)

We can use the "table" function to get the frequency table for this new variable:

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

table(Cars93$AirBags_bin)

Note that if you are using Snippets in your browser, you have to repeat the lines that load MASS and create the new variable every time you run a chunk. That is because Snippets is running in your browser and it starts fresh each time. If you were running R locally on your own computer, you would only have to run those lines once. Snippets is not a good way to do serious data analysis, but it's useful as a quick demonstration for this assignment.

Part 1. Marked for Completion

Marked complete (1 point), mostly complete (0.5), or missing/incomplete (0).

1a) How many cars are there in TOTAL in the dataset (what is the sample size)? How many of the cars in this dataset have at least one airbag? Hint: use the frequency table of "Airbags_bin" that you created above to answer this question.

1b) Suppose we are interested in how common it was for a car in 1993 to have some kind of airbag. Our sample is the dataset Cars93. Let's use a binomial test to evaluate whether more than half (> 50%) of the cars back in 1993 had the presence of an airbag. What would be the null hypothesis for this binomial test?

You can run a binomial test in R using the line of code below. The function "binom.test" takes three arguments: x is the number of samples in your main category, n is the sample size, and p is the expected (null) proportion for your main category. Run the test below, by copying and pasting the line into Snippets, and then replacing x and n with the correct numbers from your frequency table. Note: the test will NOT run unless you replace x and n with numbers!

binom.test(x, n, p=0.5)

1c) After you run the test, the output gives you a summary of the result. We'll use the conventional alpha threshold of 0.05. Explain in your own words how you would report the results of this test, including the p-value. What does this result mean? Did you reject the null hypothesis? It's a good idea to also provide descriptive statistics on the proportion of cars with airbags when you are reporting this result.

Next, let's examine a numerical variable, a car's fuel efficiency during city driving. This is found in the MPG.city column. Let's compare the distribution of fuel efficiencies for car models with and without airbags using the binary airbag variable that we previously created. First, let's examine a graph:

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

boxplot(MPG.city ~ AirBags_bin, data=Cars93)

1d) What type of graph is produced by the code above? In your own words, briefly describe what you conclude from the graph. How did fuel efficiency differ for 1993 car models with and without airbags?

Here is another way to plot the same information:

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

stripchart(MPG.city ~ AirBags_bin, data=Cars93, vertical=T)

1e) What type of graph is produced by the code above? Suggest two problems with this graph that make it difficult to for a reader to interpret.

Let's make an updated version of the graph below with a few alterations. In the code below, we first calculate the mean city fuel efficiency for each group, then we re-draw the previous graph and add line segments to show the two group means:

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

(mean.mpgs <- by(Cars93$MPG.city, Cars93$AirBags_bin, mean))

stripchart(MPG.city ~ AirBags_bin, data=Cars93, vertical=T, method= 'jitter', pch=16, col='red', ylab='Fuel efficiency (mpg)')

segments(x0=c(0.85,1.85), x1=c(1.15,2.15), y0=mean.mpgs, lwd=3)

Part 2. Marked for Correctness

Marked correct (1), partially correct (0.25, 0.5, or 0.75), or incorrect/incomplete (0).

2a) Now, let's test whether there is an association between these two variables, the presence of airbags and a car's fuel efficiency during city driving. What kind of statistical test would be appropriate for this test, given the types of data involved and the question we are asking? What is the null hypothesis for this test?

Run the test as follows:

library(MASS)

Cars93$AirBags_bin <- factor(ifelse(Cars93$AirBags=="None", "No Airbag", "Airbag"))

t.test(MPG.city ~ AirBags_bin, data=Cars93, var.equal=T)

2b) Explain in your own words the outcome of the test, including the p-value, and what it means. It's a good idea to provide descriptive statistics to go along with your claim. Did you reject the null hypothesis?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

College Algebra

Authors: Murray R Spiegel, Robert E Moyer

4th Edition

0071825851, 9780071825856

More Books

Students also viewed these Mathematics questions

Question

=+has value been diminished?

Answered: 1 week ago

Question

1. Build trust and share information with others.

Answered: 1 week ago