Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Can you please help me figure out this assignment, I am not really familiar with RStudio. I have to use that program to do my

Can you please help me figure out this assignment, I am not really familiar with RStudio. I have to use that program to do my Statistics assignments, but some of the questions that are asking me are really tricky for me to understand. Please use RStudio for each answer including any R code to produce graphical or numerical summaries of the data including as well any graphs to produce and once you had complete it please explain to me each answer briefly with complete sentences, proper grammar and be sure to answer in context so I can understand what exactly were some of those questions asking me to do. Here are the instructions for this assignment:

1.

A study was conducted to evaluate three treatments for Type 2 Diabetes in patients aged 10-17. A total of 699 patients participated in the study, and each one was randomly assigned to one of the three treatments: met for min (met), treatment with metformin combined with rosiglitazone (rosi), or a lifestyle intervention program (lifestyle). A primary outcome was recorded for each patient, which was either lacked glycemic control (failure) or achieved glycemic control (success). The results of the study are can be found in the data file diabetes2.csv, which can be downloaded from Canvas. Load the data into R and answer the following questions:

a. Produce an appropriate graphical display to examine the relationship between the treatment and the outcome. In your answer, include both the R command that you used and a copy of the graph itself.

Write a couple of sentences discussing what the graph reveals.

b. Within each of the three treatment groups, what percentage of patients achieved glycemic control?

Which treatment worked the best in this sample? Which one was the worst?

2.The datafile ncbirths.csv contains data about 1,000 randomly selected babies born in the state of North Carolina. Download the data, load it into R, and answer the following questions:

a. What percentage of the mothers were smokers? What percentage were nonsmokers? (This information is contained in the variable habit).

b. The weight (in pounds) of each of the babies at birth is recorded in the variable weight. Describe the distribution of the birthweights, including an appropriate graph to visualize the distribution. In your description, be sure to address shape, center, spread, and outliers (if any) and provide appropriate numerical summaries (with units!).

c. Babies whose birth weight is below 2500 grams (5 lbs. 8 oz.) are referred to as low birth weight (LBW). LBW babies have elevated risks for many health problems, both immediately after birth and later in childhood. Is there a relationship between smoking by the mother during pregnancy (the

variable habit in the data) and the risk of a baby being born with low birth weight (using the variable low birth weight)? Provide appropriate numerical summaries and discuss any apparent relationship. Also, make an appropriate graph to visualize the relationship.

d.Based on these data and your answer to part (c), can we conclude that smoking by the mother during pregnancy causes an increased risk of low birthweight babies? Why or why not? Explain.

e. Does there appear to be any relationship between smoking by the mother during pregnancy and the weight of the baby at birth? In particular, do mothers who smoke during pregnancy tend to have babies, who weigh less on average? (Note: For this part of the question, use the variables habit

and weight. Do not use low birthweight.) Provide appropriate graphical and numerical summaries for comparing the two groups (smokers vs. nonsmokers) and discuss what those summaries reveal.

f. Is there a relationship between the length of the pregnancy (given by the variable weeks) and the birth weight of the baby (given by the variable weight, in pounds)? Make an appropriate graph to visualize the relationship and describe any relationship that you see based on the graph.

g. Is there a relationship between smoking during pregnancy (habit) and the length of the pregnancy (weeks)? Provide appropriate graphical and numerical summaries and discuss any observed relationship.

This also includes summary data for R Commands, here are screenshots for this summary data information:

image text in transcribedimage text in transcribed
Relationship between a categorical explanatory variable and a numerical response variable To summarize the relationship, we can look at summary statistics for each group, comparative boxplots, histograms, or density plots To look at summary statistics to compare the groups defined by the explanatory variable: tapply (response_var, explanatory_var, command) The argument "command" should be replaced by the command for the statistic that you want to compute, which could be summary (for five number summary along with the mean), mean, sd, median, IQR, etc. Ex: tapply (cdc$weight, cdc$gender, summary) tapply (cdc$weight, cdc$gender , sd) To make comparative boxplots: boxplot (response_var ~ explanatory_var, horizontal=TRUE) Ex: boxplot(cdc$weight ~ cdc$gender, horizontal=TRUE) To make comparative histograms: library (lattice) histogram(~ response_var | explanatory_var, layout=c(1, n)) In layout=c(1, n), the n needs to be replaced by the number of categories in the explanatory variable. In the example below, the explanatory variable (cdc$gender) has 2 categories, so we replace n with 2. Ex: library(lattice) histogram(~ cdc$weight | cdc$gender, layout=c(1, 2)) To make comparative density plots: library (lattice) densityplot (~ response_var | explanatory_var, layout=c(1, n)) Ex: library(lattice) densityplot (~ cac$weight | cdc$gender, layout=c(1,2)) Relationship between two numerical variables To summarize the relationship, we can compute the correlation, look at a scatterplot, look at a least-squares regression line. To make a scatterplot: plot (response_var ~ explanatory_var) Ex: plot (cdc$wtdesire ~ cdc$weight) To compute the correlation coefficient r: cor (variablel, variable2, use= "complete. obs") For correlation, the order of the two variables does not matter. It gives the same result either way. If there is any missing data (NA), you must include use= "complete. obs". If not, this part of the command can be omitted. Ex: cor (cdc$wtdesire, cdc$weight) Linear regression: To find the line of best fit for the data, run the following commands: results

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Managerial Accounting

Authors: Peter Brewer, Ray Garrison, Eric Noreen

7th edition

978-1259675539, 125967553X, 978-1259594168, 1259594165, 78025796, 978-0078025792

Students also viewed these Mathematics questions

Question

Briefly describe how the percentiles are calculated for a data set.

Answered: 1 week ago

Question

2. Talk to other teachers or parents about ideas for reinforcers.

Answered: 1 week ago