8 This exercise relates to the College data set, which can be found in the fle College csv on the book website It contains a number of variables for 7 7 7 diferent universities and colleges in the US The variables are Private Public private indicator Apps Number of applications received Accept Number of applicants accepted Enroll Number of new students enrolled Top 1 0 perc New students from top 1 0 of high school class Top 2 5 perc New students from top 2 5 of high school class F Undergrad Number of full time undergraduates P Undergrad Number of part time undergraduates 2 4 Exercises 5 5 Outstate Out of state tuition Room Board Room and board costs Books Estimated book costs Personal Estimated personal spending PhD Percent of faculty with Ph D s Terminal Percent of faculty with terminal degree S F Ratio Student faculty ratio perc alumni Percent of alumni who donate Expend Instructional expenditure per student Grad Rate Graduation rate Before reading the data into R , it can be viewed in Excel or a text editor ( a ) Use the read csv ( ) function to read the data into R Call the loaded data college Make sure that you have the directory set to the correct location for the data ( b ) Look at the data using the View ( ) function You should notice that the frst column is just the name of each university We don t really want R to treat this as data However, it may be handy to have these names for later Try the following commands rownames ( college ) college , 1 View ( college ) You should see that there is now a row names column with the name of each university recorded This means that R has given each row a name corresponding to the appropriate university R will not try to perform calculations on the row names However, we still need to eliminate the frst column in the data where the names are stored Try college college , 1 View ( college ) Now you should see that the frst data column is Private Note that another column labeled row names now appears before the Private column However, this is not a data column but rather the name that R is giving to each row ( c ) i Use the summary ( ) function to produce a numerical summary of the variables in the data set ii Use the pairs ( ) function to produce a scatterplot matrix of the frst ten columns or variables of the data Recall that you can reference the frst ten columns of a matrix A using A , 1 1 0 5 6 2 Statistical Learning iii Use the plot ( ) function to produce side by side boxplots of Outstate versus Private iv Create a new qualitative variable, called Elite, by binning the Top 1 0 perc variable We are going to divide universities into two groups based on whether or not the proportion of students coming from the top 1 0 of their high school classes exceeds 5 0 Elite rep ( No , nrow ( college ) ) Elite college$Top 1 0 perc 5 0 Yes Elite as factor ( Elite ) college data frame ( college , Elite ) Use the summary ( ) function to see how many elite universities there are Now use the plot ( ) function to produce side by side boxplots of Outstate versus Elite v Use the hist ( ) function to produce some histograms with difering numbers of bins for a few of the quantitative variables You may fnd the command par ( mfrow c ( 2 , 2 ) ) useful it will divide the print window into four regions so that four plots can be made simultaneously Modifying the arguments to this function will divide the screen in other ways vi Continue exploring the data, and provide a brief summary of what you discover

Question

8   This exercise relates to the College data set, which can be found in the fle College csv on the book website  It contains a number of variables for 7 7 7 diferent universities and colleges in the US   The variables are Private   Public   private indicator Apps   Number of applications received Accept   Number of applicants accepted Enroll   Number of new students enrolled Top 1 0 perc   New students from top 1 0   of high school class Top 2 5 perc   New students from top 2 5   of high school class F   Undergrad   Number of full   time undergraduates P   Undergrad   Number of part   time undergraduates 2   4 Exercises 5 5 Outstate   Out   of   state tuition Room Board   Room and board costs Books   Estimated book costs Personal   Estimated personal spending PhD   Percent of faculty with Ph   D   s Terminal   Percent of faculty with terminal degree S   F   Ratio   Student   faculty ratio perc alumni   Percent of alumni who donate Expend   Instructional expenditure per student Grad Rate   Graduation rate Before reading the data into R , it can be viewed in Excel or a text editor  ( a ) Use the read csv ( ) function to read the data into R   Call the loaded data college  Make sure that you have the directory set to the correct location for the data  ( b ) Look at the data using the View ( ) function  You should notice that the frst column is just the name of each university  We don t really want R to treat this as data  However, it may be handy to have these names for later  Try the following commands    rownames ( college )     college   , 1     View ( college ) You should see that there is now a row names column with the name of each university recorded  This means that R has given each row a name corresponding to the appropriate university  R will not try to perform calculations on the row names  However, we still need to eliminate the frst column in the data where the names are stored  Try   college     college   ,   1     View ( college ) Now you should see that the frst data column is Private  Note that another column labeled row names now appears before the Private column  However, this is not a data column but rather the name that R is giving to each row  ( c ) i   Use the summary ( ) function to produce a numerical summary of the variables in the data set  ii   Use the pairs ( ) function to produce a scatterplot matrix of the frst ten columns or variables of the data  Recall that you can reference the frst ten columns of a matrix A using A   , 1   1 0     5 6 2   Statistical Learning iii  Use the plot ( ) function to produce side   by   side boxplots of Outstate versus Private  iv   Create a new qualitative variable, called Elite, by binning the Top 1 0 perc variable  We are going to divide universities into two groups based on whether or not the proportion of students coming from the top 1 0   of their high school classes exceeds 5 0       Elite     rep (   No   , nrow ( college ) )   Elite   college$Top 1 0 perc   5 0        Yes    Elite     as   factor ( Elite )   college     data frame ( college , Elite ) Use the summary ( ) function to see how many elite universities there are  Now use the plot ( ) function to produce side   by   side boxplots of Outstate versus Elite  v   Use the hist ( ) function to produce some histograms with difering numbers of bins for a few of the quantitative variables  You may fnd the command par ( mfrow   c ( 2 , 2 ) ) useful  it will divide the print window into four regions so that four plots can be made simultaneously  Modifying the arguments to this function will divide the screen in other ways  vi   Continue exploring the data, and provide a brief summary of what you discover

Accepted Answer

The Answer is in the image, click to view ...

Question

8 . This exercise relates to the College data set, which can be found in the fle College.csv on the book website. It contains a

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

New Trends In Databases And Information Systems Adbis 2019 Short Papers Workshops Bbigap Qauca Sembdm Simpda M2p Madeisd And Doctoral Consortium Bled Slovenia September 8 11 2019 Proceedings

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question