Answered step by step
Verified Expert Solution
Question
1 Approved Answer
BUA4004 Teamwork Assessment - Data Analysis Report 1 Introduction In this assessment, you will work in a team of 2, 3 or 4 students
BUA4004 Teamwork Assessment - Data Analysis Report 1 Introduction In this assessment, you will work in a team of 2, 3 or 4 students including yourself. Formation of each team is left to the students' autonomy. Your instructor will NOT decide on your team members. Recall that your instructor has strongly encouraged you to interact with other students since the first workshop session in Week 1, expecting each of you to identify your potential team members. Note that you are NOT required to inform your instructor of the members of your team before the submission of the work. If you do not work as part of a team and subsequently submit an individual work, you will not meet one of the five Subject Intended Learning Outcomes, namely SILOS. Therefore, your submission will automatically be penalised by 20%. For example, even if the work is worth 70 out of 100, if it is an individual submission it will receive only 56. First of all, select the team leader once a team is formed. The key task of each team leader is to submit the work through her/his LMS account on behalf of the team. Non-leaders MUST NOT submit any file to avoid confusing the LMS system. Only the leader must submit the work. Please note that leaders are NOT expected to do more than non-leaders in terms of analysis and report preparation. In case you have a misunderstanding about the nature of teamwork, please bear in mind that, in this subject, teamwork is NOT about dividing up tasks among students to reduce workload per student. It is not about the production of patch work. Each student is expected to do ALL tasks. Teamwork is about improving the submission through comparisons and discussions of answers from different individuals. Research has provided empirical evidence to support the idea that two brains are better than one, three are better than two, etc. (but too many brains don't seem to work well because of coordination difficulty when there are too many people). All team members are expected to prepare their own response to each task before each team meeting. All team members should have chances to critically examine the answers prepared by the other members of the team. Typically during each meeting, but ideally before it in order to give everyone enough time to think through and make each meeting productive. ALL members of a team will receive the SAME mark based on the leader's submission. Because of the nature of teamwork as described above, we will not accept a common complaint such as "I knew the correct answer but lost lots of marks because of the other students in my team". Students who make this type of complaint are unaware that the loss of marks is partly due to their fault because they had failed to persuade other students that they were wrong during meetings. For the same reason, all members of a team will be equally penalised when even just one of the members commits academic misconduct and taints the team's work. Very unfortunately, we have several such cases in the past semesters. For this reason, students are advised to choose teammates carefully and actively monitor each other whether they are preparing their answers sincerely by themselves (easy check: Can they logically explain when you ask questions about their prepared answers during meetings? - If not, suspect plagiarism). Another common complaint is for example "That student hasn't done anything, and we want to exclude him from the team, as it's unfair that he receives a mark based on our effort." A reasonable point. But it is problematic if you suddenly drop such a lazy student closer to the submission 4 Speak other language Can read Can write Attended school #years attended school Highest education level completed Currently schooling School level currently attending Type of school, not yet Class 1 Type of school, Class 1 or above Private lesson after school Why not attending/attended school? Total educational expenditure for the last school year (in riels) Whether the person speaks any other language Whether the person can read in any language Whether the person can write in any language Whether the person has ever attended school Number of years the person has attended school Highest education level the person has completed Whether the person is currently schooling School level the person is currently attending Type of school if the person is attending at a level lower than Class 1 Type of school if the person is attending at the level of Class 1 or above Whether the person is taking any private lesson after school Reason if "No" in "Currently schooling" or "No" in "Attended school" Column 16h on page 27 of the questionnaire PDF file The data file contains the information on 6,351 persons. Familialise yourself with the data set before you tackle the tasks below. It will help you better understand the data set if you also read pages 1-3, 24-27 of the questionnaire PDF file while you browse the data set. Task 1. Preparing the data for analysis Task 1.1 First of all, in order to focus on the children of schooling age, remove from the provided data set anyone who is either younger than 6 years old or older than 17. In this assessment, let us concentrate only on the children of the household head because they are the majority of the children in the data set and the number of children of the other household members in the data set are too small to be representative. Therefore, also remove anyone who is not a child of the household head. Hint: You are free to take any approach to do this task, but one relatively straightforward procedure is the following: First, create a dummy variable by using the AND function inside the IF function. Then, delete all rows with 0 in the dummy variable by using Custom Sort. (Visit the Microsoft Excel Support website where you can find example of using the AND function inside the IF function, if you are unsure of how to do that.) The use of Filter is not recommended, as it does not remove unwanted observations from the data set but only hides them. Note that your reduced data set should contain 1,288 children. Task 1.2 5 Second, remove 6 years old children who had not yet started schooling because of their birthday and the timing of the beginning of an academic year. Otherwise, we will overestimate the number of eligible children who did not go to school. Hint: Use the variable "Why not attending school?" where their parents stated that they were "Too young" to go to school. Again, the use of the AND function inside the IF function will be useful. Note that your further reduced data set should now contain 1,266 children. Task 2. Inspecting the distribution of children The team decides to begin with an inspection of the distribution of children. The team is specifically interested in knowing whether there is any relationship between being out of school and age, and also between being out of school and sex. Task 2.1 Construct a relative frequency table to observe the distribution of the schooling-age children by age, sex and the incidence of currently schooling. Present each relative frequency only up to 3 digits after the decimal point. Hint: First of all, carefully examine the 3 variables "Age", "Sex" and "Currently schooling". You'll notice that there are 34 missing entries in "Currently schooling". You'll also notice that we can fill those empty cells with "No" because those children have an entry in "Why not attending/attended school?". Use Replace under Find & Select in Excel, to fill all empty cells with "No" in the column "Currently schooling". This will ensure that you can retain all 1,266 children in the data set. Treat the variable "Age" as a discrete variable. Then, create 4 categories using "Sex" and "Currently schooling", e.g., Currently schooling male, Currently not schooling male, etc. because a table is only two-dimensional but we want to disaggregate the children by three criteria. Task 2.2 Visualise the distribution above by creating a multiple bar chart where age is on the horizontal axis. Briefly comment on the chart by pointing out any two features you notice in the chart. Task 2.3 Assuming that the data set represents the population of schooling-age children in the country, .... a) What is the probability that a randomly selected schooling-age boy is currently out of school? b) What is the probability that a randomly selected schooling-age girl is currently out of school? c) Considering your answers above, are girls more likely to be out of school than boys? d) What is the probability that a randomly selected primary school-age child (age 6 to 11) is currently out of school? 5 Second, remove 6 years old children who had not yet started schooling because of their birthday and the timing of the beginning of an academic year. Otherwise, we will overestimate the number of eligible children who did not go to school. Hint: Use the variable "Why not attending school?" where their parents stated that they were "Too young" to go to school. Again, the use of the AND function inside the IF function will be useful. Note that your further reduced data set should now contain 1,266 children. Task 2. Inspecting the distribution of children The team decides to begin with an inspection of the distribution of children. The team is specifically interested in knowing whether there is any relationship between being out of school and age, and also between being out of school and sex. Task 2.1 Construct a relative frequency table to observe the distribution of the schooling-age children by age, sex and the incidence of currently schooling. Present each relative frequency only up to 3 digits after the decimal point. Hint: First of all, carefully examine the 3 variables "Age", "Sex" and "Currently schooling". You'll notice that there are 34 missing entries in "Currently schooling". You'll also notice that we can fill those empty cells with "No" because those children have an entry in "Why not attending/attended school?". Use Replace under Find & Select in Excel, to fill all empty cells with "No" in the column "Currently schooling". This will ensure that you can retain all 1,266 children in the data set. Treat the variable "Age" as a discrete variable. Then, create 4 categories using "Sex" and "Currently schooling", e.g., Currently schooling male, Currently not schooling male, etc. because a table is only two-dimensional but we want to disaggregate the children by three criteria. Task 2.2 Visualise the distribution above by creating a multiple bar chart where age is on the horizontal axis. Briefly comment on the chart by pointing out any two features you notice in the chart. Task 2.3 Assuming that the data set represents the population of schooling-age children in the country, .... a) What is the probability that a randomly selected schooling-age boy is currently out of school? b) What is the probability that a randomly selected schooling-age girl is currently out of school? c) Considering your answers above, are girls more likely to be out of school than boys? d) What is the probability that a randomly selected primary school-age child (age 6 to 11) is currently out of school? As for the data set, the statistical expert team of BCG has already processed the original data set to some extent for ease of analysis by your team. Download the following Excel file, also located together with this PDF file. From LSMS Cambodia 2019.xlsx The following list of variables in the data file has also been provided to you. Please note that the variables following HHID are household-level variables (that is, the information is the same for all individuals within the same household) while the variables following PID are individual-level variables (that is, the info is specific to the person in question). The data is organised such that individuals are grouped by household, e.g. 7 persons from the same household are presented from row 2 to row 8. 3 Variable HHID Province/capital District/city/khan Urban/rural Head's sex Head's age Head's ethnicity Head speaks Khmer Head speaks other language Head can read Head can write Head attended school #years Head attended school Head's highest education level completed #children under 18 PID Sex Age Relationship to Head Father at home Father's PID Mother at home Mother's PID Marital status Spouse at home Spouse's PID Ethnicity Speak Khmer Description Household identification number Household location. Find out the location associated with each number by referring to https://microdata.worldbank.org/index.php/catalog/4045/data- dictionary/F1?file_name=hh_sec_1 Household location. Find out the location associated with each number by referring to https://microdata.worldbank.org/index.php/catalog/4045/data- dictionary/F1?file_name=hh_sec_1 1 for urban, 2 for rural Sex of the household head Age of the household head (number of completed years) Ethnicity of the household head Whether the household head speaks the main language Whether the household head speaks any other language Whether the household head can read in any language Whether the household head can write in any language Whether the household head has ever attended school Number of years the household head has attended school Highest education level the household head has completed Number of children under the age of 18 years in the household Person identification number Sex of the person Age of the person The person's relationship to the household head Whether the person's father lives in the same house PID of the person's father Whether the person's mother lives in the same house PID of the person's mother The person's marital status Whether the person's spouse lives in the same house PID of the person's spouse Ethnicity of the person Whether the person speaks the main language 7 with actual expenditure data. Highlight the block A2:C1267. Go to Find & Select. Choose Replace. Do not type anything in "Find what". Type any letter (say, z) in "Replace with". Tick "Match entire cell contents". Click Replace All. Now you'll see all previously empty looking cells contain the letter you used. Do not close the "Find and Replace" menu yet. 4) Now, type the letter you used in "Find what". Then, delete the letter in "Replace with". Click Replace All. Now you'll see the letter entries are gone, and the cells are looking empty again. Do the same procedure as Step 2 above to check if you've successfully removed zero-length strings. You should get TRUE this time. This procedure exploits the fact that the Find and Replace function cannot distinguish between zero length and true blank and that no entry in "Replace with" returns true blank. Admittedly, the procedure is long-winded, but is necessary for handling empty-looking non-blank cells (which, if not eliminated, will frustrate you during data analysis). Task 3.2 a) Produce a table of descriptive statistics to understand the distribution of each of the 3 expenditure variables. (Hint: Excel's Data Analysis will not produce some of the useful statistics you've learned.) b) Also produce a chart presenting 3 box-whiskers plots. c) Discuss the distributions of the 3 expenditure variables, using the statistics and chart that you produced above. (Hint: Your discussion will become systematic if you talk about central location, symmetry and spread. Write one short paragraph about each of those three distributional features. Within each paragraph, in addition to discussing each variable's distribution, do not forget to point out differences and similarities across the 3 expenditure variables.) Task 3.3 Does the average educational expenditure per primary school child exceed 500,000 riels per academic year in Cambodia? Support your answer by providing a step-by-step hypothesis test at the 5% level of significance. Task 4. Exploring the causes of education-related expenditures The team now would like to conduct preliminary analysis on the determinants of household expenditure on child education. The members discussed potential factors affecting the expenditure and, as a result, formulated the following linear regression model: Eih Bo+B1Ai,h + B2 Mi,h + B3 Kh + B4Yn + B5 Ch + with Eih denoting the total expenditure on the education of child i in household h (in riels), Ai,h denoting the age of child i in household h (in years), Mi,h denoting the indicator of child i in household h being male, Kh denoting the indicator of the head of household h (that is, a parent of child i) being Khmer, 6 e) What is the probability that a randomly selected lower secondary school-age child (age 12 to 14) is currently out of school? f) What is the probability that a randomly selected upper secondary school-age child (age 15 to 17) is currently out of school? g) Considering your answers above, are older children more likely to be out of school than younger children? Hint: Use your answer to Task 2.1 above. Task 3. Examining the distribution of education-related expenditures The data set provides information on the total education-related expenditure for each child during the last academic year if the child went to school during the period. The team would like to examine the expenditures across different school levels. Task 3.1 First, the expenditure data should be disaggregated by school-age group. Follow the instructions below to create 3 new variables based on "Total educational expenditure for the last school year (in riels)" and "Age". 1) Use the 3 columns next to "Total educational expenditure for the last school year (in riels)". Label the columns. For example, "Expenditure, age 6-11", "Expenditure, age 12-14", "Expenditure, age 15-17". 2) For the first column, in the first cell under the label, type =IF(AND($R2 Paste Values => Values on the Task 4.2b worksheet. 6) Select from A1 to F1267. Use Replace to delete z. Keep the data range selection. 7) Go to Find & Select again. Select "Go To Special...". Choose "Blanks". Then, right-click at any of the selected blank cell. Choose "Delete". Choose "Shift cells up" and then click OK. You have now created a data set without a missing observation, containing 999 children. Task 4.3 Run the regression, and answer the following questions. a) Are all slope coefficients statistically significant at the 5% level of significance? If not, which slope coefficients are statistically significant at that level? b) Interpret each of the statistically significant slope coefficients. c) Is there a sign of gender bias in household expenditure on child education? Briefly explain your answer. d) Is there a sign of ethnicity bias in household expenditure on child education? Briefly explain your answer. Y denoting the number of years the head of household h attended school, Ch denoting the total number of children under the age of 18 in household h. Task 4.1 First of all, in order to conduct this regression analysis, create the dummy variables Mi,h and Kh, as they are not available in the data set. (Ei,h is "Total educational expenditure for the last school year (in riels)", Ah is "Age", Y is "#years Head attended school", and C is "#children under 18".) 8 Task 4.2 Regression in the Data Analysis menu does not handle blank cells well. It does not run regression when there are missing observations in any variable. You can see that the outcome variable has many missing observations. Therefore, remove all children who do not have expenditure information. Use the following instruction if the team does not have any other favourite procedure. 1) Copy and paste the 6 variables of the model onto the Task 4.2a worksheet. 2) Fill the blank cells in "Total educational expenditure for the last school year (in riels)" with any letter (say, "z") using Replace. (Remember the issue of zero-length strings, raised in Task 3.1? The current procedure will avoid generating zero-length strings, instead of generating them first and deleting them later.) 3) Create a new variable "A" using both "Age" and "Total educational expenditure for the last school year (in riels)" as follows. In the cell under the label "A", type =IF($F2="z","z",A2) if "Age" is in column A and "Total educational expenditure for the last school year (in riels)" is in column F. Then, copy and paste the formula to fill the column. This procedure will copy the data in "Age" only if the expenditure info is available and enter letter "z" otherwise. 4) Apply the same procedure to each of the remaining 4 explanatory variables. 5) Now copy the 6 variables that contain letter "z". Then, Paste => Paste Values => Values on the Task 4.2b worksheet. 6) Select from A1 to F1267. Use Replace to delete z. Keep the data range selection. 7) Go to Find & Select again. Select "Go To Special...". Choose "Blanks". Then, right-click at any of the selected blank cell. Choose "Delete". Choose "Shift cells up" and then click OK. You have now created a data set without a missing observation, containing 999 children. Task 4.3 Run the regression, and answer the following questions. a) Are all slope coefficients statistically significant at the 5% level of significance? If not, which slope coefficients are statistically significant at that level? b) Interpret each of the statistically significant slope coefficients. c) Is there a sign of gender bias in household expenditure on child education? Briefly explain your answer. d) Is there a sign of ethnicity bias in household expenditure on child education? Briefly explain your answer.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
BUA4004 Teamwork AssessmentData Analysis Report Introduction In this assessment we will work in a te...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started