Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Applied This exercise relates to the College data set, which can be found in the file College.csv on the book website. It contains a number
Applied
This exercise relates to the College data set, which can be found in
the file College.csv on the book website. It contains a number of
variables for different universities and colleges in the US The
variables are
Private : Publicprivate indicator
Apps : Number of applications received
Accept : Number of applicants accepted
Enroll : Number of new students enrolled
Topperc : New students from top of high school class
Topperc : New students from top of high school class
FUndergrad : Number of fulltime undergraduates
PUndergrad : Number of parttime undergraduates
Outstate : Outofstate tuition
Room.Board : Room and board costs
Books : Estimated book costs
Personal : Estimated personal spending
PhD : Percent of faculty with PhDs
Terminal : Percent of faculty with terminal degree
SFRatio : Studentfaculty ratio
perc.alumni : Percent of alumni who donate
Expend : Instructional expenditure per student
Grad.Rate : Graduation rate
Before reading the data into Python, it can be viewed in Excel or a
text editor.Before reading the data into Python, it can be viewed in Excel or a
text editor.
a Use the pdreadcsv function to read the data into Python. Call
the loaded data college. Make sure that you have the directory
set to the correct location for the data.
b Look at the data used in the notebook by creating and running
a new cell with just the code college in it You should notice
that the first column is just the name of each university in a
column named something like Unnamed: We don't really want
pandas to treat this as data. However, it may be handy to have
these names for later. Try the following commands and similarly
look at the resulting data frames:
This has used the first column in the file as an index for the
data frame. This means that pandas has given each row a name
corresponding to the appropriate university. Now you should see
that the first data column is Private. Note that the names of
the colleges appear on the left of the table. We also introduced
a new python object above: a dictionary, which is specified by
key value pairs. Keep your modified version of the data with
dictionary
the following:the following:
c Use the describe method of to produce a numerical summary
of the variables in the data set.
d Use the pdplotting.scattermatrix function to produce a
scatterplot matrix of the first columns Topperc, Apps, Enroll
Recall that you can reference a list of columns of a data frame
A using
e Use the boxplot method of college to produce sidebyside
boxplots of Outstate versus Private.
f Create a new qualitative variable, called Elite, by binning the
Topperc variable into two groups based on whether or not the
proportion of students coming from the top of their high
school classes exceeds
college 'Elite' cut 'Topperc'
labels 'Yes'
Use the valuecounts method of college Elite to see how
many elite universities there are. Finally, use the boxplot method
again to produce sidebyside boxplots of Outstate versus Elite.
g Use the plot.hist method of college to produce some his
tograms with differing numbers of bins for a few of the quanti
tative variables. The command plt subplots may be use
ful: it will divide the plot window into four regions so that four
plots can be made simultaneously. By changing the arguments
you can divide the screen up in other combinations.
h Continue exploring the data, and provide a brief summary of
what you discover.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started