Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 2 - 28 marks Note: Your solution should be contained in the same Jupyter notebook A researcher in social science is interested in studying

Question 2 - 28 marks

Note: Your solution should be contained in the same Jupyter notebook

A researcher in social science is interested in studying childhood obesity in the UK. She downloads some data from the government website relating to overweight children. These data were collected through the National Child Measurement Programme (NCMP), and a child is defined as obese based on their BMI measurement. The latest data that she finds are from the years 2020 and 2021. These data were classified according to the following four categorical variables:

deprivation: the index of multiple deprivation (IMD), taking the values:

- most: child lives in one of the 10% most deprived areas in the UK

- least: child lives in one of the 10% least deprived areas in the UK

gender: the gender the child identifies with, taking the coded values 0 (for male) and 1 (for female)

ageGroup: the age group the child is in, taking the coded values 1 (for 4 to 5 year olds) and 2 (for 10 to 11 year olds)

obese: whether a child is classified as being obese based on their BMI, taking the coded values 0 (for no) and 1 (for yes).

The counts in the cells of the following contingency table for these classifying variables are stored in the variable count.

(a) Using an appropriate stepwise model selection method, find a log-linear model that has lower Akaike information criterion (AIC) values than the saturated model. (Don't give up if your first attempt fails!) Show that this model obeys the hierarchy principle, and use the residual deviance of this model to show that it provides an adequate fit to the data. [8]

(b) The social scientist is primarily interested in how obesity in children is related to the other factors. Suggest what type of model she could fit, what the response variable should be and, based on your result from part (a), state and justify which main effects and interactions of the other variables you would expect there to be in the linear predictor of this model. [6]

(c) The data are also given in the data frame ncmpManipulated and stored in the file ncmpManipulated.RData. This data frame contains the same variables deprivation, age and gender, along with an extra couple of variables: countYes: the number of obese children in the corresponding category countNo: the number of children who are not obese in the corresponding category.

Explain why the ncmpManipulated data frame is suitable for the analysis suggested in part (b) whereas the ncmp data frame is not. [3]

(d) Fit the model you suggested in part (b) and point out the similarities with the log-linear model from part (a). [4]

(e) By what factor do the estimated odds of obesity differ for two children of the same age and gender, but with different levels of deprivation? [2]

(f) Now consider children with the same level of deprivation. Does one gender have higher odds of obesity? Is this the same in both age groups? Justify your answers.

[5]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mathematics for Economics and Business

Authors: Ian Jacques

9th edition

129219166X, 9781292191706 , 978-1292191669

More Books

Students also viewed these Mathematics questions

Question

=+a) What kind of study was this?

Answered: 1 week ago

Question

Identify the most stable compound:

Answered: 1 week ago