Question
IMDB Information for US Movies : This dataset contains information about a sample of American-produced movies, including box office, genre, number of ratings, and how
IMDB Information for US Movies: This dataset contains information about a sample of American-produced movies, including box office, genre, number of ratings, and how positive the ratings are. For this case study, you've been hired by a movie studio as a data analyst to answer the question 'What kinds of movies are most successful'? How do positive reviews relate to box office income?
In your essay, you will answer the research question for your case study as though you were a professional statistics consultant writing a report for their client. The essay will have the following sections:
Introduction
Briefly introduce the topic you'll be discussing, and the questions you hope to answer.
Describe the dataset. How was it gathered? Are there any limitations or issues with this data?
Describe the sample size and its relationship to the population of interest.
Select and describe the variables you will be working with in this paper. You will need to select three variables, including at least two continuous and one categorical variable. You will use these variables to answer the question you've stated above, so choose carefully and after re-reading the instructions for your data set.
What is the level of measurement for each variable?
Is this a dependent or independent variable?
Descriptive Statistics
Calculate statistics related to measures of central tendency and measures of dispersion for your numeric variables. For your categorical variable, describe the proportion of the group in each category.
Calculate these statistics using excel or google sheets. You will be required to submit your work.
Discuss the meaning of these statistics for understanding the data.
Remember that you're writing as a statistical consultant explaining the results to your client.
What does the relative location of mean and median tell you?
What does the standard deviation, range, and IQR tell us about the dispersion of the data?
Are there any outliers?
Is the data normally distributed? Why does this matter?
Include at least one graphical display for each variable which communicates the data's distribution.
The charts should be clearly labeled and chosen so that they're appropriate for the type of variable.
Inferential Statistics
Calculate the confidence intervals for the proportion of your categorical variable.
Explain the results to your clients, so they can understand how their sample relates to the population
Translate your questions into two formal hypotheses to test.
One hypothesis should require you to compare the mean of a numerical variable across a category
One hypothesis will ask about correlation or regression.
For each hypothesis,
Select the appropriate statistical test
Identify your alpha level
Identify if this is a one-tailed or two-tailed test
Calculate your test statistics
You will use excel or google sheets for your calculations. You will be required to submit these with your work.
Discuss your results. Can we reject the null hypothesis?
In plain language, tell your clients what this means for the answer to their research questions.
Include one chart for each hypothesis test. For your correlational or regression test, include a scatter plot.
Conclusions
Conclude by summarizing your findings for your clients
What is the answer to their research question?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
The question is incomplete because it is a detailed ass...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started