Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

20 teams were involved. The points achieved were used as the dependent variable, Y . They were the league points achieved during the 38-game season,

20 teams were involved. The points achieved were used as the dependent variable, Y. They were the league points achieved during the 38-game season, in 2007-2008. The independent variables are as follows: There were 24 “pitch actions”, labelled X1 to X24. But some of these variables provide the same information, such as X1, the number of goals scored, and X2, average goals per game; or X6, percent of goals scored inside the box, and X7, percent of goals scored from outside the box. Therefore, 17 variables were initially retained.

  • X2, Average goals per game.         
  • X3, Number of shots at goal.

  • X4, Percentage shots on target.
  • X5, Percentage of goals scored to shots at goal.
  • X7, Percentage of goals scored from outside the box.
  • X10, Number of total passes.
  • X11, Ratio of short to long passes of the ball during all games
  • X12, Overall pass completion percentage.

  • X15, Total number of ball crosses.

  • X16, Cross completion percentage.
  • X18, Average number of goals conceded per game.
  • X19, Number of tackles.
  • X20, Percentage of tackles won.

  • X21, Number of blocks, clearances and interceptions.
  • X22, Number of fouls.
  • X23, Number of yellow cards.
  • X24, Number of red cards.

To avoid problems resulting from highly correlated variables, known as multicollinearity, the analysis should be reduced to the following set of variables.

  • X5, Percentage of goals scored to shots at goal.
  • X7, Percentage of goals scored from outside the box.
  • X11, Ratio of short to long passes of the ball during all games
  • X15, Total number of ball crosses.

  • X18, Average number of goals conceded per game.
  • X23, Number of yellow cards.

The aim is to study team performance, using this set of play descriptor variables. In this task, you will look at the relationship between the descriptor variables and the team performances.

The data for football team is available at the top of this Case study

In this task, you will be asked to do some exploratory analysis, including visualisation. This will be followed by multiple linear regression. Finally, ANOVA, the analysis of variance, will be used.


part 1

Q1. Data Preparation and initial investigation.

1.1). Read the file into R (RStudio software). Have the row labels as the team names. For the analyses to follow, select the variables X5, X7, X11, X15, X18, X23.

1.2) Examine all pairwise plots between the variables. Briefly describe this.

1.3) Support this visualisation of the data, by determining the variables’ correlations. Accompany that with a short description.

1.4) Provide three conclusions from this investigation.

part 2

Question 2. Examine the influence of the six play descriptor variables (X5, X7, X11, X15, X18, X23) in relation to performance, as measured by the league points achieved by the team.

2.1) Comment on the effectiveness of multiple linear regression for this objective.

2.2) Support your comment with what the data reveals.

2.3) Describe at least two of the main assumptions for this modelling.

2.4) Write down a null hypothesis for a hypothesis test in regard to each variable's role in the team performance outcome.

2.5) Describe what the regression reveals in regard to each variable's role.

part 3

Q3. The next part of this case study is to examine the following three groups of teams. Firstly, there is the top performing (in terms of their league points) four teams, which will go on the ‘Champions League’ competition. Secondly there is the middle ground of the next twelve teams. Finally, there is the bottom set of four teams, that will be in danger of relegation from the premier league.

3.1) Using ANOVA, determine which variables and how powerfully these variables, influence the top, middle and bottom sets of teams.

3.2) Make a table of your results, so that it can be referred to in your discussion.

3.3) Describe at least two of the main assumptions in regard to this ANOVA assessment.

3.4) Display the data using a boxplot and summarize what this indicates. It is recommended that you do the boxplots for variables, in the case of each of the top, middle and bottom sets of teams.

3.5) Taking account of all your findings, and in particular how much explanatory value the variables provide for the top, middle and bottom performances, give two recommendations, for a team manager, from what emerged from your analysis.



 

teams X5 X16 X18 Y X2 X3 X4 X7 X10 X11 X12 X15 X19 X20 X21 X22 X23 X24 A 83 1.95 473 43.97 15.64 10.81 18831 8.68 83.18 883 24.01 0.82 878 74.72 2438 407 55 3 B 60 1.87 405 44.94 17.53 16.9 11438 4.77 69.76 926 25.59 1.34 885 79.77 2826 559 54 4 Ic 35 1.21 314 45.54 14.65 21.74 10630 3.79 63.74 769 22.5 1.63 935 76.58 2809 546 70 D 58 1.32 421 43.94 11.88 20 13650 4.98 74.03 874 22.43 1.26 833 74.67 2736 583 72 6 E 37 0.95 340 43.82 10.59 11.11 11855 4.48 67.85 901 23.53 1,42 866 76.44 2859 547 76 F 85 1.71 455 43.74 14.29 18.46 17250 7.01 81.16 937 20.7 0.68 917 76.23 2302 478 63 5 IG 11 0.53 288 40.28 6.94 20 11733 4.42 67.3 798 21.3 2.34 898 72.72 2685 548 63 1 H. 65 1.45 361 46.81 15.24 9.09 12397 4.79 69.32 868 23.27 0.87 816 77.21 2773 494 40 36 1 367 41.96 10.35 15.79 12574 4.44 70.77 842 22.09 1.58 851 76.62 2661 493 55 6 76 1.76 535 40 12.52 19.4 16360 4.93 77.06 922 25.27 0.74 1089 73.74 2651 465 45 1. K 55 1.18 323 44.27 13.93 17.78 15147 5.62 76.23 691 21.27 1.39 916 77.62 2492 427 50 4 L 87 2.11 547 47.54 14.63 17.5 17417 6.21 80.5 930 24.41 0.58 978 76.81 2574 425 51 2 M 42 1.13 352 43.18 12.22 13.95 12199 4.21 71.01 946 23.68 1.39 835 75.45 2805 562 86 2 43 1.18 358 42.74 12.57 17.78 13947 4.76 72.64 957 19.85 1.71 876 76.14 2516 483 63 1 lo 57 1.26 411 38.93 11.68 20.83 12793 4.62 73.5 654 21.87 1.05 963 75.91 2733 541 55 3 P 36 1.08 332 39.76 12.35 12.2 10431 3.74 64.35 1072 22.67 1.74 800 74.38 2369 492 59 Q 39 0.95 358 38.27 10.06 11.11 12727 4.67 70.07 882 24.94 1.55 892 74.33 2831 517 65 4 R 46 1.74 424 45.99 15.57 9.09 15464 5.37 77.43 987 22.19 1.61 960 74.58 2708 410 51 1. S 49 1.11 384 40.1 19.94 11.9 13762 4.82 74.11 904 24.23 1.32 848 74.76 2546 562 63 1. T 40 0.89 363 38.84 9.37 14.71 11672 4.42 68.87 756 25 1.34 883 78.94 2504 527 59 4

Step by Step Solution

3.48 Rating (198 Votes )

There are 3 Steps involved in it

Step: 1

Equation with sig independent variables Correlation matrix Y X 2 X 3 X 4 X 5 X 7 X 10 X 11 X 12 X 15 X 16 X 18 X 19 X 20 X 21 X 22 X 23 X 24 Y 1000000 0878199 0854716 0400976 0525971 00250839 0786004 ... blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essentials Of Modern Business Statistics With Microsoft Excel

Authors: David R. Anderson, Dennis J. Sweeney, Thomas A. Williams

6th Edition

978-1305445628, 1305445627, 978-1285867045

More Books

Students explore these related Mathematics questions