Answered step by step
Verified Expert Solution
Question
1 Approved Answer
I need help with this assignment DATA-51100: Statistical Programming Programming Assignment 5 - Data Preparations and Statistics Introduction The file cps.csv (attached) contains school profile
I need help with this assignment
DATA-51100: Statistical Programming Programming Assignment 5 - Data Preparations and Statistics Introduction The file cps.csv (attached) contains school profile information for Chicago Public Schools. Your program will derive some data from it and then generate some statistical information. Requirements You are to create a program in Python that performs the following: 1. Loads the cps.csv file (assume it's in the current directory) and create a DataFrame object from it. 2. Based on the data contained in the cps.csv file, generates a dataframe with the following information: a. School_ID b. Short_Name c. is_High School d. Zip e. Student Count Total f. College_Enrollment_Rate_School g. Lowest Grade Offered (derived from Grades_Offered_All column) h. Highest Grade Offered (derived from Grades_Offered_All column) i. Starting Hour (derived from School_Hours column) The values for a-f are based on existing columns in the data. For g-l, you will need to generate new columns which derives information from existing ones. Replace the missing numeric values with the mean for that column. Display the first 10 rows of this dataframe. 3. Displays the following information: a. Mean and standard deviation of College Enrollment Rate for High Schools b. Mean and standard deviation of Student Count_Total for non-High Schools C. Distribution of starting hours for all schools d. Number of schools outside of the Loop Neighborhood (i.e., outside of zip codes 60601, 60602, 60603, 60604, 60605, 60606, 60607, and 60616) Additional Requirements 1. The name of your source code file should be DataStats.py. All your code should be within a single file. 2. You need to use the pandas DataFrame object for storing data. 3. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments. 4. You need to use meaningful identifier names that conform to standard naming conventions. 5. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name. What to Turn In You will turn in the single DataStats.py file as well as a screenshot of your output(s) using BlackBoard. Sample Program Output CPSC-51100, (semester] [year] NAME: [put your name here] PROGRAMMING ASSIGNMENT #5 Short_Name Is_High_School Zip Student Count Total College_Enrollment_Rate_School Lowest_Grade_Offered Highest_Grade_Offered School Start_Hour PK School ID 609952 609869 609896 610590 610087 610503 400164 610059 610206 609872 415 241 346 91 248 00 00 00 GREENE LANGFORD DRUMMOND BRONZEVILLE CLASSICAL BLAIR FRAZIER PROSPECTIVE INSTITUTO - LOZANO HS MAYER TWAIN PEREZ False 60609 False 60636 False 60622 False 60609 False 60638 False 60624 True 60608 False 60614 False 60638 False 60608 58.084302 58.084302 58.084302 58.084302 58. 084302 58.084302 21.900000 58.084302 58.084302 58. 084302 198 XXXXXX 00 00 ON NOON 78 760 1094 318 00 00 00 00 College Enrollment Rate for High Schools = 58.08 (sd=25.07) Total Student Count for non-High Schools = 521.55 (sd=268.64) Distribution of Starting Hours 8am: 408 7am: 191 9am: 39 Number of schools outside the Loop: 634 CORRECTION: Distribution of Starting Hours should be: 8am: 415 7am: 193 9am: 40 DATA-51100: Statistical Programming Programming Assignment 5 - Data Preparations and Statistics Introduction The file cps.csv (attached) contains school profile information for Chicago Public Schools. Your program will derive some data from it and then generate some statistical information. Requirements You are to create a program in Python that performs the following: 1. Loads the cps.csv file (assume it's in the current directory) and create a DataFrame object from it. 2. Based on the data contained in the cps.csv file, generates a dataframe with the following information: a. School_ID b. Short_Name c. is_High School d. Zip e. Student Count Total f. College_Enrollment_Rate_School g. Lowest Grade Offered (derived from Grades_Offered_All column) h. Highest Grade Offered (derived from Grades_Offered_All column) i. Starting Hour (derived from School_Hours column) The values for a-f are based on existing columns in the data. For g-l, you will need to generate new columns which derives information from existing ones. Replace the missing numeric values with the mean for that column. Display the first 10 rows of this dataframe. 3. Displays the following information: a. Mean and standard deviation of College Enrollment Rate for High Schools b. Mean and standard deviation of Student Count_Total for non-High Schools C. Distribution of starting hours for all schools d. Number of schools outside of the Loop Neighborhood (i.e., outside of zip codes 60601, 60602, 60603, 60604, 60605, 60606, 60607, and 60616) Additional Requirements 1. The name of your source code file should be DataStats.py. All your code should be within a single file. 2. You need to use the pandas DataFrame object for storing data. 3. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments. 4. You need to use meaningful identifier names that conform to standard naming conventions. 5. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name. What to Turn In You will turn in the single DataStats.py file as well as a screenshot of your output(s) using BlackBoard. Sample Program Output CPSC-51100, (semester] [year] NAME: [put your name here] PROGRAMMING ASSIGNMENT #5 Short_Name Is_High_School Zip Student Count Total College_Enrollment_Rate_School Lowest_Grade_Offered Highest_Grade_Offered School Start_Hour PK School ID 609952 609869 609896 610590 610087 610503 400164 610059 610206 609872 415 241 346 91 248 00 00 00 GREENE LANGFORD DRUMMOND BRONZEVILLE CLASSICAL BLAIR FRAZIER PROSPECTIVE INSTITUTO - LOZANO HS MAYER TWAIN PEREZ False 60609 False 60636 False 60622 False 60609 False 60638 False 60624 True 60608 False 60614 False 60638 False 60608 58.084302 58.084302 58.084302 58.084302 58. 084302 58.084302 21.900000 58.084302 58.084302 58. 084302 198 XXXXXX 00 00 ON NOON 78 760 1094 318 00 00 00 00 College Enrollment Rate for High Schools = 58.08 (sd=25.07) Total Student Count for non-High Schools = 521.55 (sd=268.64) Distribution of Starting Hours 8am: 408 7am: 191 9am: 39 Number of schools outside the Loop: 634 CORRECTION: Distribution of Starting Hours should be: 8am: 415 7am: 193 9am: 40Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started