Question
GENERAL INFORMATION The objective of this project is to analyse the information about young employees in Tasmania and to discuss issues related to relationships between
GENERAL INFORMATION
The objective of this project is to analyse the information about young
employees in Tasmania and to discuss issues related to relationships between
employees wages, abilities and family backgrounds, etc.
Hypothesis tests must be performed at a 5% level of significance. Students are
NOT allowed to directly use the test function in MS Excel to conduct
hypothesis tests. All hypothesis tests must follow the structure below:
A. State the null and alternative hypotheses.
B. Show how to construct the test statistic and what the distribution is under
the null hypothesis.
C. Calculate the test statistic.
D. State the significance level of the test.
E. State the rejection rule.
F. State the conclusion of test expressed in terms of the aim of the test.
You may work either in a group or individually (no groups of more than
three members will be permitted) to complete the assignment. You will need
to complete the group assignment cover sheet (available on MyLO) and attach
to your assignment. For a group submission, each student in the group needs
to write a brief statement of his/her contribution on the cover sheet. All
students must sign this cover sheet if you are working as a group. It is NOT
acceptable for students to sign for other group members.
An electronic copy of the assignment needs to be submitted to the Dropbox on
MyLo before 4:00pm on Tuesday 6 th Oct 2015.
You will be required to use the most appropriate technique or method to
evaluate or to present data. DO NOT use every technique you can think of as
this only shows that you do not understand what is required. Use the most
appropriate, although you should also remember that different
techniques/tests/graphs may provide you with different types of information.
Use your judgement carefully.
Your explanations must be clear, concise and complete.
It is recommended that you type your assignment using a computer. A hand-
written assignment can be accepted only if it is legible and easy to follow, and
it is scanned and converted to an electronic copy for submission. However, all
of tables/graphs/estimation results have to be copied from Excel outputs.
~ 3 ~
All tables and diagrams should be accurately referenced and referred to in the
text. All diagrams should be fully labelled.
Ensure that you analyse data thoroughly and present results carefully. Make
sure that you interpret results in the context of the initial problem in order to
show your understanding. You can also make recommendations about further
research that should be conducted in order to provide a better answer.
As far as practicable, all students should be involved with all of the answers if
you chose to work in a group. Please note that teaching staff are not
responsible for resolving personal difficulties you may have when working
with peers.
~ 4 ~
DATA DESCRIPTION (A FICTITIOUS DATASET DESIGNED FOR THE
ASSIGNMENT ONLY)
A consulting firm randomly selected 150 young employees in Tasmania. These
selected employees answered questions and undertook a standard IQ test and a KW
test. The KW test examines respondents knowledge about the duties in their
workplaces and the knowledge about the Australian labour market. Respondents
answers are entered into a spreadsheet where each column represents a variable.
These variables include:
1. wage: monthly earnings in dollars
2. hours: average weekly working hours
3. IQ: IQ score
4. KW: knowledge of work score. High scores indicate that the employees have a
high level of knowledge of work.
5. educ: years of education
6. exper: years of work experience
7. tenure: years with the current employer
8. age: age in years
9. marriage: marriage status
10. gender: female or male
11. urban: =Y if lives in urban areas
=N if lives in rural areas
12. sibs: the number of siblings
13. brthord: birth order, e.g. =2 means he/she is the second child in the family.
14. meduc: mothers education
15. feduc: fathers education
The missing values are shown by a . in the cells.
~ 5 ~
QUESTIONS (TOTOAL MARKS: 80)
QUESTION 1
Read the provided raw data carefully to check whether all respondents have provided
information for each variable. If some values are missing, you need to decide how to
deal with the missing observations. Two methods could be considered here. You can
either simply exclude the corresponding respondents from your analysis, or take a
sample average of the corresponding variable and replace the missing values of this
variable with the sample average.
i. Discuss the pros and cons of the two methods for dealing with missing values
mentioned above.
[5 marks]
ii. Explain what you have done to manage the missing data. For example, if you
decide to exclude the respondents, explain how many respondents you have
deleted and for what reasons you did so. If you decide to replace the missing
values with sample averages, then provide the values of the sample averages
of the variables of your concern.
[3 marks]
iii. Clearly indicate the final number of observations (respondents) you will use in
the following analysis. Submit an Excel spreadsheet of the final dataset
together with your assignment.
[2 marks]
[Total 10 marks]
QUESTION 2
Pick up one discrete numerical variable and answer the following questions.
i. Use an appropriate graph to present this variable, and describe what you see from the
graph.
[5 marks]
ii. Calculate the five-number summary.
[8 marks]
iii. Is the distribution of the variable is symmetric? Explain why or why not.
[2 marks]
[Total 15 marks]
~ 6 ~
QUESTION 3
Pick one numerical variable that you think may affect the KW scores, and answer the
following questions.
i. Briefly explain what relationship you expect between the variable that you have
chosen and the KW scores.
[2 marks]
ii. Use an appropriate graph to discuss the relationship between the KW scores and the
variable that you have picked up.
[5 marks]
iii. Use an appropriate numerical measure to discuss the relationship between the KW
scores and the variable that you have picked.
[3 marks]
[Total 10 marks]
QUESTION 4
You want to look at the relationship between gender and wages. However, you notice
that gender is a categorical variable and wage is a numerical variable. One way to
work on two different types of variables is to transform one variable to the type of the
other. You decide to generate one new categorical variable wagegroup based on the
level of wage. Enter high if a respondents wage is no less than $1000 and enter
low otherwise. Answer the following questions.
i. Present this new variable wagegroup using an appropriate graph and briefly discuss.
[4 marks]
ii. Present these two categorical variables together using an appropriate graph,
and briefly discuss.
[5 marks]
iii. Produce a contingency table of frequencies to present these two categorical variables.
(Hint: you may need Excel skills -- e.g. use the commands such as sort or
countif to count the relevant frequencies.)
[8 marks]
iv. Convert the contingency table of frequencies in part (iii) to a contingency table of
probabilities.
[2 marks]
v. Based on the sample information, compare the probability of either being a
female or getting a low wage level with the probability of either being a male
or getting a high wage level.
[3 mark]
~ 7 ~
vi. Examine whether the statement Males tend to receive high wages than
females is true, false or inconclusive based on the sample information and
explain your reason.
[3 marks]
[Total 25 marks]
QUESTION 5
In the previous question, you simply use $1000 dollars as the given threshold value to
distinguish employees with lower wages from those who have higher wages. One may
argue that it is a sensible choice only if the population average wage is $1000.
I. Construct a 95% confidence estimate for the population average of wage, and then
comment whether it is sensible to believe that the population average wage is
$1000.
[4 marks]
II. Conduct a hypothesis test on the null hypothesis that the population average of wage
is $1000. Discuss the choice of using $1000 to generate the categorical variable
in Question 4 based on your test result.
[6 marks]
[Total 10 marks]
QUESTION 6
Use simple linear regression analysis to determine which of the numerical variables
provided is the most important determinant of Tasmanian young employees wage?
Please also provide all of the Excel regression outputs.
[Total 10 marks]
-------------------------------------------------------------------------------------------------------
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started