Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Applied This exercise relates to the College data set, which can be found in the file College.csv on the book website. It contains a number

Applied

This exercise relates to the College data set, which can be found in

the file College.csv on the book website. It contains a number of

variables for

777

different universities and colleges in the US

.

The

variables are

Private : Public

/

private indicator

Apps : Number of applications received

Accept : Number of applicants accepted

Enroll : Number of new students enrolled

Top

10

perc : New students from top

10 %

of high school class

Top

25

perc : New students from top

25 %

of high school class

.

Undergrad : Number of full

-

time undergraduates

.

Undergrad : Number of part

-

time undergraduates

Outstate : Out

-

-

state tuition

Room.Board : Room and board costs

Books : Estimated book costs

Personal : Estimated personal spending

PhD : Percent of faculty with Ph

.

.

Terminal : Percent of faculty with terminal degree

.

.

Ratio : Student

/

faculty ratio

perc.alumni : Percent of alumni who donate

Expend : Instructional expenditure per student

Grad.Rate : Graduation rate

Before reading the data into Python, it can be viewed in Excel or a

text editor.Before reading the data into Python, it can be viewed in Excel or a

text editor.

(

)

Use the pd

.

read

_

csv

()

function to read the data into Python. Call

the loaded data college. Make sure that you have the directory

set to the correct location for the data.

(

)

Look at the data used in the notebook by creating and running

a new cell with just the code college in it

.

You should notice

that the first column is just the name of each university in a

column named something like Unnamed:

0 .

We don't really want

pandas to treat this as data. However, it may be handy to have

these names for later. Try the following commands and similarly

look at the resulting data frames:

This has used the first column in the file as an index for the

data frame. This means that pandas has given each row a name

corresponding to the appropriate university. Now you should see

that the first data column is Private. Note that the names of

the colleges appear on the left of the table. We also introduced

a new python object above: a dictionary, which is specified by

(

key

,

value

)

pairs. Keep your modified version of the data with

dictionary

the following:the following:

(

)

Use the describe

()

method of to produce a numerical summary

of the variables in the data set.

(

)

Use the pd

.

plotting.scatter

_

matrix

()

function to produce a

scatterplot matrix of the first columns

[

Top

10

perc, Apps, Enroll

] .

Recall that you can reference a list

C

of columns of a data frame

A using

A [C] .

(

)

Use the boxplot

()

method of college to produce side

-

-

side

boxplots of Outstate versus Private.

(

)

Create a new qualitative variable, called Elite, by binning the

Top

10

perc variable into two groups based on whether or not the

proportion of students coming from the top

10 %

of their high

school classes exceeds

50 % .

college

[

'Elite'

] = p d .

cut

(c o l l e g e [

'Top

10

perc'

],

[0, 0.5, 1],

labels

= [' N o',

'Yes'

])

Use the value

_

counts

()

method of college

['

Elite

']

to see how

many elite universities there are. Finally, use the boxplot

()

method

again to produce side

-

-

side boxplots of Outstate versus Elite.

(

)

Use the plot.hist

()

method of college to produce some his

-

tograms with differing numbers of bins for a few of the quanti

-

tative variables. The command plt

.

subplots

(2, 2)

may be use

-

ful: it will divide the plot window into four regions so that four

plots can be made simultaneously. By changing the arguments

you can divide the screen up in other combinations.

(

)

Continue exploring the data, and provide a brief summary of

what you discover.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Driven Web Sites

Authors: Joline Morrison, Mike Morrison

2nd Edition

★★★★★

How are continuous variables normally handled in Decision Tree Algorithms?

Answered: 1 week ago

Previous Question Next Question