Question
The objective of this exercise is to understand the concept of estimator, (estimate), unbiasedness or otherwise of the estimator, standard error and sampling distribution (is
The objective of this exercise is to understand the concept of estimator, (estimate), unbiasedness or otherwise of
the estimator, standard error and sampling distribution (is it normal) of the estimators through simulation. Among
other thing, you should be able to experience validity of the Central limit theorem, appropriate formulae for standard
errors of estimators, sanctity of confidence coefficient in confidence intervals. We saw something similar in class
when the population has a known discrete distribution.
STEP 1. Consider any finite data set of your choice (interest) with at least 100 data points on single metric (It is
recommended that this be the same as Q1 of your Exercise 1; but, if really required, you may consider some other
dataset; in this exercise, you would treat this as the population of interest. Identify a suitable population mean ,
population SD and proportion (). [You may suitably consider the proportion to be % of values more (less) than a
certain threshold or you can select a categorical variable for the population units] Because the population is known,
you would know the values of , and ; however these values would be used for the validation purpose only.]
STEP 2. Take k = 10000 + (100 x) * 20, and n=x, where x is the last 3 digits of your Roll No. Draw/generate k n data
points from the above population by SRSWR. [You may find it handy to use menu: "Data - Data Analysis - random
number generation " and choose discrete distribution and for 'value and probability input range' use population
data range giving each value a probability of 1/N, where N is the number of data points in your population.
Arrange the data points into n columns and k rows. Each row of data is to be treated as a specific sample of size n
and you have k such samples. Every user has effectively one row of this date set and must make suitable inference
using his/her row of data alone. For conceptual consideration, you would be considering the case of k users
simultaneously each of whom are drawing a sample of size n adopting SRSWR from the same population.
STEP 3. Calculate the estimates of , , 2 and , namely X, S, S2 , p , and also the estimated standard errors of
X and p for every user. Corresponding to k rows, you now have access to k realized values of each of your
estimators X, S, S2 , p . If your PGP roll no is an odd number you need to consider estimation and inference for
population SD, while if your roll number is an even number, you need to consider the inference for population
variance. (The subsequent instructions need to be adjusted accordingly.) Use these to reflect on the following:
STEP 4.
a. Draw histogram and reflect on the sampling distributions of the estimators X, S, S2 , p . Do you believe
that the CLT is in business, in each case?
2
b. From the simulated k values compute the expected values and standard deviation (errors) of X and p
(by taking mean and SD of the respective columns). Examine if they match with the parameter values from
STEP 1. Also verify if the sample SD (or variance, as assigned to you) is an unbiased estimator for population
SD (or variance).
STEP 5.
a. For every user (using his/her row of data alone), construct confidence intervals for , , and (or 2 ,
as assigned to you.) The respective confidence coefficients should be (90+a)%, (95a)% and (90c)%,
respectively where a =last digit of your PGP roll no, and c = 2nd last digit of your PGP roll no.
b. Now use the parameter values from STEP 1 to check if for each individual user, the intervals contain
the respective target parameter. Reflect on the aggregate for k users. Reflect on the sanctity of the
confidence coefficient. Why are the methods working or not working (in each case) according to you?
Report your results and in the following format:
1st worksheet: to be titled POPULATION should contain population data. You should define "proportion" clearly
and report population size, and the values of , and . Report your roll no and hence your choice of k, n and the
3 confidence coefficients.
2nd worksheet: to be titled SIMULATED DATA should indicate your choice of k and n. It should then contain k by n
array of simulated data. In the submitted version do NOT include this worksheet; just keep if for your reference.
3rd worksheet: to be titled EST_SE_CI Fill up the following table, and keep the entire table for your reference.
However, for the submitted version, submit only the first 100 rows of the table; the summary and conclusion as
required to be reported in the fourth worksheet, must take into account all the k rows.
Mean problem proportion problem SD/ variance problem
Xbar SE(Xbar) LL of C.I. UL of C.I. p SE(p) LL of C.I. UL of C.I. Est LL od CI UL of CI
Row 1
Row 2
.
.
Row k
4th worksheet: to be titled SUMMARY On the basis of the above table and possible additional calculations. In
particular, summarize your observations answering Q4a, 4b, 5b
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started