Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

8 . This exercise relates to the College data set, which can be found in the fle College.csv on the book website. It contains a

8. This exercise relates to the College data set, which can be found in
the fle College.csv on the book website. It contains a number of
variables for 777 diferent universities and colleges in the US. The
variables are
Private : Public/private indicator
Apps : Number of applications received
Accept : Number of applicants accepted
Enroll : Number of new students enrolled
Top10perc : New students from top 10% of high school class
Top25perc : New students from top 25% of high school class
F.Undergrad : Number of full-time undergraduates
P.Undergrad : Number of part-time undergraduates
2.4 Exercises 55
Outstate : Out-of-state tuition
Room.Board : Room and board costs
Books : Estimated book costs
Personal : Estimated personal spending
PhD : Percent of faculty with Ph.D.s
Terminal : Percent of faculty with terminal degree
S.F.Ratio : Student/faculty ratio
perc.alumni : Percent of alumni who donate
Expend : Instructional expenditure per student
Grad.Rate : Graduation rate
Before reading the data into R, it can be viewed in Excel or a text
editor.
(a) Use the read.csv() function to read the data into R. Call the
loaded data college. Make sure that you have the directory set
to the correct location for the data.
(b) Look at the data using the View() function. You should notice
that the frst column is just the name of each university. We dont
really want R to treat this as data. However, it may be handy to
have these names for later. Try the following commands:
> rownames(college)<- college[,1]
> View(college)
You should see that there is now a row.names column with the
name of each university recorded. This means that R has given
each row a name corresponding to the appropriate university. R
will not try to perform calculations on the row names. However,
we still need to eliminate the frst column in the data where the
names are stored. Try
> college <- college[,-1]
> View(college)
Now you should see that the frst data column is Private. Note
that another column labeled row.names now appears before the
Private column. However, this is not a data column but rather
the name that R is giving to each row.
(c) i. Use the summary() function to produce a numerical summary
of the variables in the data set.
ii. Use the pairs() function to produce a scatterplot matrix of
the frst ten columns or variables of the data. Recall that
you can reference the frst ten columns of a matrix A using
A[,1:10].
562. Statistical Learning
iii. Use the plot() function to produce side-by-side boxplots of
Outstate versus Private.
iv. Create a new qualitative variable, called Elite, by binning
the Top10perc variable. We are going to divide universities
into two groups based on whether or not the proportion
of students coming from the top 10% of their high school
classes exceeds 50%.
> Elite <- rep("No", nrow(college))
> Elite[college$Top10perc >50]<- "Yes"
> Elite <- as.factor(Elite)
> college <- data.frame(college, Elite)
Use the summary() function to see how many elite universities there are. Now use the plot() function to produce
side-by-side boxplots of Outstate versus Elite.
v. Use the hist() function to produce some histograms with
difering numbers of bins for a few of the quantitative variables. You may fnd the command par(mfrow = c(2,2))
useful: it will divide the print window into four regions so
that four plots can be made simultaneously. Modifying the
arguments to this function will divide the screen in other
ways.
vi. Continue exploring the data, and provide a brief summary
of what you discover

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

New Trends In Databases And Information Systems Adbis 2019 Short Papers Workshops Bbigap Qauca Sembdm Simpda M2p Madeisd And Doctoral Consortium Bled Slovenia September 8 11 2019 Proceedings

Authors: Tatjana Welzer ,Johann Eder ,Vili Podgorelec ,Robert Wrembel ,Mirjana Ivanovic ,Johann Gamper ,Mikolaj Morzy ,Theodoros Tzouramanis ,Jerome Darmont

1st Edition

ISBN: 3030302776, 978-3030302771

More Books

Students also viewed these Databases questions

Question

Why bond prices and interest rates are inversely related?

Answered: 1 week ago

Question

=+/ Review of organizational literature/information

Answered: 1 week ago