Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Description of Dataset In Portugal secondary education consist of 3 years of schooling, done after 9 years of basic education. In other words, secondary school

Description of Dataset

In Portugal secondary education consist of 3 years of schooling, done after 9 years of basic education. In other words, secondary school corresponds to grades 10, 11 and 12.

The data was collected during the 2005-2006 school year from two public schools in the Alentejo region of Portugal.

Note: The drinking age in Portugal used to be 16 for low-alcohol beverages like beer, wine or cider, and 18 for high-alcohol beverages. In 2015, the government harmonized the drinking age to 18 across all beverage types. This dataset is from 2006 (prior to the harmonization of the age limit). So, do not be surprised to see students younger than 18-year old reporting drinking alcohol in the data below.

The grade variable is a score on the scale of 0 to 20. This score can be interpreted as grades using the table below:

The dataset has been color coded to indicate the types of variable/data:

Grade - outcome / dependent variable

Student lifestyle variables

Support available and other variables

Family context variables

Student demographic and other variables

Description of all the variables:

Variable

Description

sex

student's sex (binary: female or male)

age

student's age (numeric: from 15 to 22)

school

student's school (binary: Gabriel Pereira or Mousinho da Silveira)

address

student's home address type (binary: urban or rural)

Pstatus

parent's cohabitation status (binary: T=living together, A=apart)

Medu

mother's education (numeric: from 0 to 4) 0=none, 1=primary (4th grade), 2=5th to 9th grade, 3=secondary (Highschool), 4=higher (University)

Mjob

mother's job (nominalb )

Fedu

father's education (numeric: from 0 to 4) 0=none, 1=primary (4th grade), 2=5th to 9th grade, 3=secondary (Highschool), 4=higher (University)

Fjob

father's job (nominalb )

famsize

family size (binary: 3 or > 3)

famrel

quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

reason

reason to choose this school (nominal: close to home, school reputation, course preference or other)

traveltime

home to school travel time (numeric: 1 <15min.,2 15to30min.,3 30min to 1 hour,4 > 1 hour).

studytime

weekly study time (numeric: 1 <2hours, 2 2 to 5hours, 3 5 to 10 hours, 4 >10 hours)

failures

number of class failures (numeric: n if 1 n < 3, else 4)

schoolsup

extra educational school support (binary: yes or no)

famsup

family educational support (binary: yes or no)

activities

extra-curricular activities (binary: yes or no)

paidclass

extra paid classes (binary: yes or no)

internet

Internet access at home (binary: yes or no)

nursery

attended nursery school (binary: yes or no)

higher

wants to take higher education (binary: yes or no)

romantic

in a romantic relationship (binary: yes or no)

freetime

free time after school (numeric: from 1 - very low to 5 - very high)

goout

going out with friends (numeric: from 1 - very low to 5 - very high)

Walc

weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

Dalc

workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

health

current health status (numeric: from 1 - very bad to 5 - very good)

absences

number of school absences (numeric: from 0 to 93)

Grade Final

final grade (numeric: from 0 to 20)

Context:

The school board, the health minister, and several parent groups are worried about student performance in mathematics in the Public schools. You have been hired to learn what variables explain (predict) the math grade obtained by students. You have collected data from 2 public schools and must report on your findings. There are conflicting viewpoints being put forward by different parent groups and politicians. The results of your study will most likely be used as political fodder and will be memed on facebook groups. More importantly your data analysis results will be taken seriously by the education minister and will inform policy decisions at the ministry.

1.Conservative parent groups blame student lifestyle choices, such as partying, going out, drinking, and having romantic relationships while in school as the reason behind low math scores. They want to see if there is evidence that students who engage in such behaviors have lower math scores or not.

2.Politicians have been laying the blame on the family/home context of students and want to pass the buck onto the families for student success. They are interested if whether the green columns (family context variables) explain a lot of the math performance.

3.School administrators are wondering if school support, paid classes outside school and other such support factors are useful or not.

Part 1: Descriptive Statistics and Data Visualization

1.[5 pts] Explore the data and share a summary of the data using descriptive statistics and any appropriate visualizations.

2.[5 pts] Explore how math grades vary by school (dataset has 2 schools), gender, parents education, and student drinking and romantic relationship variables.

3.[5 pts] Explore how math grades vary by student health, absences, past failures and whether the student had nursery education or not.

The above work should give you some sense of which variables are important.

Part 2: Cluster Analysis

The goal of the cluster analysis is to identify different types of students. Do not use Grade (blue) variable in forming the clusters. Exclude the following variables from cluster analysis: Grade Final, Mjob, Fjob, internet, famsup, schoolsup, reason, school, and traveltime.

If you find some other variables do not differentiate between clusters, you may exclude them when running the final cluster analysis.

4.[30 pts] Using k-means cluster analysis (after excluding the variables mentioned above) to find different types of students. Test solutions starting from 3 clusters all the way to max 8 clusters. What is the best clustering solution, i.e. what is the best # of clusters to use? Name your clusters/seasons and describe them in plain English.

5.[10 pts] What is the average, min, max and range of the Grade variable for each cluster? Are some clusters doing better at Math than others? If so suggest possible explanations.

Part 3: Regression Analysis (and perhaps Cluster analysis if required)

6.[20 pts] What is the best regression model that explains (predicts) the Grade Final (math performance) variable? What variables are important predictors, i.e. explain variation in the grade?

7.[10 pts] The variables have been color coded into different types, e.g. family context variables, student life choice variables etc. How many variables of each type end up in your final model? Remove all variables of a particular type, then add them back to the regression model. How much does adjusted R2 change when each group of variables is taken out and added back-in? Use this to compare the explanatory power of each type/group of variables. Do these results support the conservative parent groups, the politicians and/or the school administrators?

8.[10 pts] Use regression to answer the following questions that are important for Seoul Bike's management. Use variations of the model presented developed in the previous question as and when required.

a.Is there significant different in math performance between the two schools?

b.Does alcohol consumption lead to lower or higher math grades?

c.Do romantic relationships help or hinder student math grades (performance)?

d.Does education of mother (or father) help student math grades?

e.Do the following variables affect math grades and if so in which direction: goout, activities, nursery, Pstatus, and famrel?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Engineering Mathematics

Authors: Erwin Kreyszig

8th Edition

471154962, 978-0471154969

More Books

Students also viewed these Mathematics questions