Question
The dataset Education - Post 12th Standard.csv is a dataset that contains the names of various colleges. This particular case study is based on various
The dataset Education - Post 12th Standard.csv is a dataset that contains the names of various colleges. This particular case study is based on various parameters of various institutions. You are expected to do Principal Component Analysis for this case study according to the instructions given in the following rubric. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: GreatLearning Logo Go Back to Advanced Statistics
Project - Advance Statistics Submission type : Online upload Due Date : Aug 16, 11:59 PM Total Score : 60 Available from : Jul 31, 8:00 AM Description Dear Participants,
Please find below Advance Statistics Project instructions:
You have to submit 2 files : Business Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables, etc. Your report should not be filled with codes. You will be evaluated based on the business report. Jupyter Notebook file: This is a must and will be used for reference while evaluating Any assignment found copied/ plagiarized with another person will not be graded and marked as zero. Please ensure timely submission as a post-deadline assignment will not be accepted. Problem 1:
A research laboratory was developing a new compound for the relief of severe cases of hay fever. In an experiment with 36 volunteers, the amounts of the two active ingredients (A & B) in the compound were varied at three levels each. Randomization was used in assigning four volunteers to each of the nine treatments. The data on hours of relief can be found in the following .csv file: Fever.csv
[Assume all of the ANOVA assumptions are satisfied]
1.1) State the Null and Alternate Hypothesis for conducting one-way ANOVA for both the variables A and B individually. [both statement and statistical form like Ho=mu, Ha>mu]
1.2) Perform one-way ANOVA for variable A with respect to the variable Relief. State whether the Null Hypothesis is accepted or rejected based on the ANOVA results.
1.3) Perform one-way ANOVA for variable B with respect to the variable Relief. State whether the Null Hypothesis is accepted or rejected based on the ANOVA results.
1.4) Analyse the effects of one variable on another with the help of an interaction plot. What is the interaction between the two treatments? [hint: use the pointplot function from the seaborn function]
1.5) Perform a two-way ANOVA based on the different ingredients (variable A & B along with their interaction 'A*B') with the variable 'Relief' and state your results.
1.6) Mention the business implications of performing ANOVA for this particular case study.
Problem 2:
The dataset Education - Post 12th Standard.csv is a dataset that contains the names of various colleges. This particular case study is based on various parameters of various institutions. You are expected to do Principal Component Analysis for this case study according to the instructions given in the following rubric. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: Data Dictionary.xlsx.
2.1) Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed]. The inferences drawn from this should be properly documented.
2.2) Scale the variables and write the inference for using the type of scaling function for this case study.
2.3) Comment on the comparison between covariance and the correlation matrix.
2.4) Check the dataset for outliers before and after scaling. Draw your inferences from this exercise.
2.5) Build the covariance matrix, eigenvalues, and eigenvector.
2.6) Write the explicit form of the first PC (in terms of Eigen Vectors).
2.7) Discuss the cumulative values of the eigenvalues. How does it help you to decide on the optimum number of principal components? What do the eigenvectors indicate? Perform PCA and export the data of the Principal Component scores into a data frame.
2.8) Mention the business implication of using the Principal Component Analysis for this case study. [Hint: Write Interpretations of the Principal Components Obtained]
data:
1) Names: Names of various university and colleges 2) Apps: Number of applications received 3) Accept: Number of applications accepted 4) Enroll: Number of new students enrolled 5) Top10perc: Percentage of new students from top 10% of Higher Secondary class 6) Top25perc: Percentage of new students from top 25% of Higher Secondary class 7) F.Undergrad: Number of full-time undergraduate students 8) P.Undergrad: Number of part-time undergraduate students 9) Outstate: Number of students for whom the particular college or university is Out-of-state tuition 10) Room.Board: Cost of Room and board 11) Books: Estimated book costs for a student 12) Personal: Estimated personal spending for a student 13) PhD: Percentage of faculties with Ph.D.s 14) Terminal: Percentage of faculties with terminal degree 15) S.F.Ratio: Student/faculty ratio 16) perc.alumni: Percentage of alumni who donate 17) Expend: The Instructional expenditure per student 18) Grad.Rate: Graduation rate
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started