Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The objective of this exercise is to understand the concept of estimator, (estimate), unbiasedness or otherwise of the estimator, standard error and sampling distribution (is

The objective of this exercise is to understand the concept of estimator, (estimate), unbiasedness or otherwise of

the estimator, standard error and sampling distribution (is it normal) of the estimators through simulation. Among

other thing, you should be able to experience validity of the Central limit theorem, appropriate formulae for standard

errors of estimators, sanctity of confidence coefficient in confidence intervals. We saw something similar in class

when the population has a known discrete distribution.

STEP 1. Consider any finite data set of your choice (interest) with at least 100 data points on single metric (It is

recommended that this be the same as Q1 of your Exercise 1; but, if really required, you may consider some other

dataset; in this exercise, you would treat this as the population of interest. Identify a suitable population mean ,

population SD and proportion (). [You may suitably consider the proportion to be % of values more (less) than a

certain threshold or you can select a categorical variable for the population units] Because the population is known,

you would know the values of , and ; however these values would be used for the validation purpose only.]

STEP 2. Take k = 10000 + (100 x) * 20, and n=x, where x is the last 3 digits of your Roll No. Draw/generate k n data

points from the above population by SRSWR. [You may find it handy to use menu: "Data - Data Analysis - random

number generation " and choose discrete distribution and for 'value and probability input range' use population

data range giving each value a probability of 1/N, where N is the number of data points in your population.

Arrange the data points into n columns and k rows. Each row of data is to be treated as a specific sample of size n

and you have k such samples. Every user has effectively one row of this date set and must make suitable inference

using his/her row of data alone. For conceptual consideration, you would be considering the case of k users

simultaneously each of whom are drawing a sample of size n adopting SRSWR from the same population.

STEP 3. Calculate the estimates of , , 2 and , namely X, S, S2 , p , and also the estimated standard errors of

X and p for every user. Corresponding to k rows, you now have access to k realized values of each of your

estimators X, S, S2 , p . If your PGP roll no is an odd number you need to consider estimation and inference for

population SD, while if your roll number is an even number, you need to consider the inference for population

variance. (The subsequent instructions need to be adjusted accordingly.) Use these to reflect on the following:

STEP 4.

a. Draw histogram and reflect on the sampling distributions of the estimators X, S, S2 , p . Do you believe

that the CLT is in business, in each case?

2

b. From the simulated k values compute the expected values and standard deviation (errors) of X and p

(by taking mean and SD of the respective columns). Examine if they match with the parameter values from

STEP 1. Also verify if the sample SD (or variance, as assigned to you) is an unbiased estimator for population

SD (or variance).

STEP 5.

a. For every user (using his/her row of data alone), construct confidence intervals for , , and (or 2 ,

as assigned to you.) The respective confidence coefficients should be (90+a)%, (95a)% and (90c)%,

respectively where a =last digit of your PGP roll no, and c = 2nd last digit of your PGP roll no.

b. Now use the parameter values from STEP 1 to check if for each individual user, the intervals contain

the respective target parameter. Reflect on the aggregate for k users. Reflect on the sanctity of the

confidence coefficient. Why are the methods working or not working (in each case) according to you?

Report your results and in the following format:

1st worksheet: to be titled POPULATION should contain population data. You should define "proportion" clearly

and report population size, and the values of , and . Report your roll no and hence your choice of k, n and the

3 confidence coefficients.

2nd worksheet: to be titled SIMULATED DATA should indicate your choice of k and n. It should then contain k by n

array of simulated data. In the submitted version do NOT include this worksheet; just keep if for your reference.

3rd worksheet: to be titled EST_SE_CI Fill up the following table, and keep the entire table for your reference.

However, for the submitted version, submit only the first 100 rows of the table; the summary and conclusion as

required to be reported in the fourth worksheet, must take into account all the k rows.

Mean problem proportion problem SD/ variance problem

Xbar SE(Xbar) LL of C.I. UL of C.I. p SE(p) LL of C.I. UL of C.I. Est LL od CI UL of CI

Row 1

Row 2

.

.

Row k

4th worksheet: to be titled SUMMARY On the basis of the above table and possible additional calculations. In

particular, summarize your observations answering Q4a, 4b, 5b

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus

Authors: Dale Varberg, Edwin J. Purcell, Steven E. Rigdon

9th edition

131429248, 978-0131429246

More Books

Students also viewed these Mathematics questions

Question

9. What are the various ways to determine a forecasts accuracy?

Answered: 1 week ago