Question
Question 2 - 28 marks Note: Your solution should be contained in the same Jupyter notebook A researcher in social science is interested in studying
Question 2 - 28 marks
Note: Your solution should be contained in the same Jupyter notebook
A researcher in social science is interested in studying childhood obesity in the UK. She downloads some data from the government website relating to overweight children. These data were collected through the National Child Measurement Programme (NCMP), and a child is defined as obese based on their BMI measurement. The latest data that she finds are from the years 2020 and 2021. These data were classified according to the following four categorical variables:
deprivation: the index of multiple deprivation (IMD), taking the values:
- most: child lives in one of the 10% most deprived areas in the UK
- least: child lives in one of the 10% least deprived areas in the UK
gender: the gender the child identifies with, taking the coded values 0 (for male) and 1 (for female)
ageGroup: the age group the child is in, taking the coded values 1 (for 4 to 5 year olds) and 2 (for 10 to 11 year olds)
obese: whether a child is classified as being obese based on their BMI, taking the coded values 0 (for no) and 1 (for yes).
The counts in the cells of the following contingency table for these classifying variables are stored in the variable count.
(a) Using an appropriate stepwise model selection method, find a log-linear model that has lower Akaike information criterion (AIC) values than the saturated model. (Don't give up if your first attempt fails!) Show that this model obeys the hierarchy principle, and use the residual deviance of this model to show that it provides an adequate fit to the data. [8]
(b) The social scientist is primarily interested in how obesity in children is related to the other factors. Suggest what type of model she could fit, what the response variable should be and, based on your result from part (a), state and justify which main effects and interactions of the other variables you would expect there to be in the linear predictor of this model. [6]
(c) The data are also given in the data frame ncmpManipulated and stored in the file ncmpManipulated.RData. This data frame contains the same variables deprivation, age and gender, along with an extra couple of variables: countYes: the number of obese children in the corresponding category countNo: the number of children who are not obese in the corresponding category.
Explain why the ncmpManipulated data frame is suitable for the analysis suggested in part (b) whereas the ncmp data frame is not. [3]
(d) Fit the model you suggested in part (b) and point out the similarities with the log-linear model from part (a). [4]
(e) By what factor do the estimated odds of obesity differ for two children of the same age and gender, but with different levels of deprivation? [2]
(f) Now consider children with the same level of deprivation. Does one gender have higher odds of obesity? Is this the same in both age groups? Justify your answers.
[5]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started