Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Final Project Analysis technique: ANOVA to analyze if there is a difference in means among 3 groups. Regression to determine what statistics are most influential

Final Project

Analysis technique: ANOVA to analyze if there is a difference in means among 3 groups. Regression to determine what statistics are most influential in calculating On-Base performance (OBP). Dataset: mlb_players_18.csv

Variable

Description

name

Name of baseball player

team

Team of baseball player

position

Position of baseball player. This variable will be used to develop the 3 groups

games

Games played (can be used for sample size)

AB

Number of Times player Batted (can be used for sample size)

R

Runs

H

Hits

doubles

Doubles

triples

Triples

HR

Home Runs

RBI

Runs Batted In

walks

Walks

strike_outs

Strike Outs

stolen_bases

Stolen Bases

caught_stealing_base

Caught Stealing a Base

AVG

Average (Hits divided by At Bats)

OBP

On Base Percentage. This variable will be used in the analysis to describe batter performance.

SLG

Slugging percentage

OPS

On Base plus Slugging percentage

Scenarios:

  1. You are an analyst for a baseball organization and the team management wants to know if there are real differences between the batting performance of baseball players according to these 3 positions: outfielder, infielder, and catcher (C). For this study, batting performance is equivalent to the variable OBP in the dataset.
  2. Management wants to understand which variables are statistically significant to the calculation of OBP. The variables to consider are average (AGE), slugging percentage (SLG), and on base plus slugging (OPS).

Important Information for Scenario 1: ANOVA

  • Variable of Interest: Batting Performance, represented by on base percentage shown in variable OBP
  • Groups: Using the position variable, develop a new variable with 3 groups according to the following definitions:
    • Outfield: If position variable = any one of the following: CF, LF, RF o Infield: If position variable = any one of the following: 1B, 2B, 3B, SS o Catcher: If position variable = C
    • Note that any other position values can be ignored for this analysis

Important Information for Scenario 2: Regression

  • Variable of Interest: Batting Performance, represented by on base percentage shown in variable OBP
  • Pitchers (position = P) should be ignored because their on base percentage is not relevant
  • The explanatory variables to consider are average (AGE), slugging percentage (SLG), and on base plus slugging (OPS).

Instructions

report to answer the management's two questions identified in the scenarios above. Summary paragraph which succinctly answers the management's questions. This summary can be thought of as a conclusion that is either presented first or last. For ANOVA, your conclusion should be based on the evaluation of the null hypothesis in relation to your p-value and your chosen significance level. For Regression, your conclusion should be based on the p-values of each of the explanatory variables. (20 points, 10 for ANOVA conclusion and 10 for Regression conclusion).

  1. Problem Statements: Make sure to state the problems you were asked to solve. (10 points)
  2. Assumptions: Clearly define any assumptions you made. For instance, did you remove any outliers in your data due to small sample size (sample size can be determined by either the games or AB [at bats] variables). (10 points, 5 points for ANOVA assumptions and 5 points for Regression assumptions)
  3. Key charts:develop at least one chart from your dataset for each the ANOVA and the Regression questions. The chart should be placed in the report where it accompanies any related insights. Make sure to give your chart a clear and descriptive Title. (Possible Idea for ANOVA: Box and Whisker Plot of OBP by each of the 3 groups to aid in outlier detection). (20 points, 10 for ANOVA chart and 10 for Regression chart)
  4. Analysis Technique:
    1. ANOVA (50 points): results should include the null hypothesis under consideration, the F-statistic, p-value, and degrees of freedom. Include your entire ANOVA table from your statistical software results (Excel, R, Python, etc.) or show your manual calculations if you did not use the ANOVA functions in any statistical software. (10 points for stating the null hypothesis, 10 points for the F-statistic, 10 points for the p-value, 10 points for the degrees of freedom, and 10 points for either the ANOVA table or your manual calculations)
    2. Regression (40 points): results should include the p-value for each of the 3 explanatory variables AVG, SLG, and OPS (10 points for each p-value), and the adjusted R2 (10 points)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A Survey Of Mathematics With Applications

Authors: Allen R. Angel, Christine D. Abbott, Dennis Runde

11th Edition

0135740460, 978-0135740460

More Books

Students also viewed these Mathematics questions

Question

=+b) What would you recommend doing next to help improve the model?

Answered: 1 week ago