Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Description of the dataset Once you open the dataset with RStudio, you will see 10 variables, named V1, V2.... V10. This dataset contains data
Description of the dataset Once you open the dataset with RStudio, you will see 10 variables, named V1, V2.... V10. This dataset contains data about a sample of households of a specific country. Each row contains information about one household. The description of the variables is as follows: V1: Food, alcohol and tobacco household expenditure V2: Total expenditure V3: Household income V4: Number of children V5: Number of adults V6: Husband's age V7: Wife's age V8: Binary variable: 1 if the husband has a college degree, 0 otherwise V9: Binary variable: 1 if the wife has a college degree. 0 otherwise V10: Binary variable: 1 if the wife works, 0 otherwise Case study framework You work for the Government of a country and you have been commissioned to develop a model that will help the regulatory authorities to understand the determinants of the household expenditure. The authorities are particularly interested in obtaining a reliable estimate of the fraction of the income spent in consumption but controlling for other relevant factors. You are asked to conduct a short pilot research project that will produce a preliminary model to demonstrate your suitability to carry out the work. QUESTION 1. (20 marks) What is the sample size? Is the mean of the variable total expenditure the same as the median? To address this question compare the difference between the mean and the median with the standard deviation. For instance, if the difference between the mean and the median is 100, and the standard deviation is 50, the difference is equal to two standard deviations. Consider the variables Food, alcohol and tobacco household expenditure. Total expenditure, and Household income. Find the pair of variables with the strongest correlation Describe, using a boxplot, the variable total expenditure. Provide a representative measure of variability for the variable total expenditure. Describe, using a histogram, the variable household income. QUESTION 2. (10 marks) Consider the variable wife's age. Calculate the first quartile, the median, the third quartile, the mean and the standard deviation. Now, using the mean and the standard deviation of that variable, calculate the values associated to the 25th percentile, 50th percentile, and 75th percentile of the theoretical Normal distribution. Comment. 2 QUESTION 3. (10 marks) Consider the variable V8 (the husband has a college degree). Calculate a confidence interval for the true proportion of husband with college degree. Assume = 0.05 QUESTION 4. (10 marks) Calculate a confidence interval for the average household income. Assume a = 0.01 QUESTION 5. (10 marks) Test whether the true proportion of husbands with a college degree is greater than 0.2. Use or = 0.1 QUESTION 6. (10 marks) Test whether the average number of children is less than 1. Use a = 0.05 QUESTION 7. (20 marks) The American Statistician described an interesting application of a probability distribution in a case in- volving illegal drugs. It all started with a 'bust' in a midsized Florida city. During the bust, police seized approximately 500 foil packets of a white, powdery substance, presumably cocaine. Since it is not a crime to buy or sell non-narcotic cocaine look alikes, detectives had to prove that the packets actually contained cocaine in order to convict their suspects of drug trafficking. When the police laboratory randomly selected and chemically tested four of the packets, all four tested positive for cocaine. This finding led to the convention of the traffickers. After the conviction, the police decided to use the remaining foil packets in reverse sting operations. Two of these packets were randomly selected and sold by undercover officers to a buyer. Between the sale and the arrest, however, the buyer disposed of the evidence. The key question is, beyond a reasonable doubt, did the defendant really purchase cocaine? In court, defendant's attorney argued that his client should not be convicted because the police could not prove that the missing foil packets contained cocaine. The police contended, however, that since four of the original packets tested positive for cocaine, the two packets sold in the reverse sting were also highly likely to contain cocaine. Show how to use probability models to solve the dilemma posed by the police's reverse sting (you do not need to use the dataset to solve this question)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started