Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Anyone who can explain how to use the Resma 3 program? I have a quiz on the moodle platform that has the following information: Case

Anyone who can explain how to use the Resma 3 program? I have a quiz on the moodle platform that has the following information: Case Study: UPR Admissions

Graduation of UPR Students:

Counts Percentages

Yes ________ _________%
No ________ _________%

Case Study: UPR Admissions

consider the upr data set . This is the application data for all the students who applied and were accepted to UPR-Mayaguez between 2003 and 2013.

dim(upr)
## [1] 23666 16

tells us that there were 23666 applications and that for each student there are 16 pieces of information.

colnames(upr)
## [1] "ID.Code" "Year" "Gender" "Program.Code" ## [5] "Highschool.GPA" "Aptitud.Verbal" "Aptitud.Matem" "Aprov.Ingles" ## [9] "Aprov.Matem" "Aprov.Espanol" "IGS" "Freshmen.GPA" ## [13] "Graduated" "Year.Grad." "Grad..GPA" "Class.Facultad"

shows us the variables

head(upr, 3)
## ID.Code Year Gender Program.Code Highschool.GPA Aptitud.Verbal ## 1 00C2B4EF77 2005 M 502 3.97 647 ## 2 00D66CF1BF 2003 M 502 3.80 597 ## 3 00AB6118EB 2004 M 1203 4.00 567 ## Aptitud.Matem Aprov.Ingles Aprov.Matem Aprov.Espanol IGS Freshmen.GPA ## 1 621 626 672 551 342 3.67 ## 2 726 618 718 575 343 2.75 ## 3 691 424 616 609 342 3.62 ## Graduated Year.Grad. Grad..GPA Class.Facultad ## 1 Si 2012 3.33 INGE ## 2 No NA NA INGE ## 3 No NA NA CIENCIAS

shows us the first three cases.

Lets say we want to find the number of males and females. We can use the table command for that:

table(Gender)
## Error: object 'Gender' not found

What happened? Right now R does not know what Gender is because it is hidden inside the upr data set. Think of upr as a box that is currently closed, so R cant look inside and see the column names. We need to open the box first:

attach(upr) table(Gender)
## Gender ## F M ## 11487 12179

Note: you need to attach a data frame only once in each session working with R.

Note: Say you are working first with a data set students 2016 which has a column called Gender, and you attached it. Later (but in the same R session) you start working with a data set students 2017 which also has a column called Gender, and you are attaching this one as well. If you use Gender now it will be from students 2017.

Note when the data was transferred from moodle with get.moodle.data() it is automatically attached.

Subsetting of Data Frames

Consider the following data frame (not a real data set):

students
## Age GPA Gender ## 1 22 3.1 Male ## 2 23 3.2 Male ## 3 20 2.1 Male ## 4 22 2.1 Male ## 5 21 2.3 Female ## 6 21 2.9 Male ## 7 18 2.3 Female ## 8 22 3.9 Male ## 9 21 2.6 Female ## 10 18 3.2 Female

Here each single piece of data is identified by its row number and its column number. So for example in row 2, column 2 we have 3.2, in row 6, column 3 we have Male.

As with the vectors before we can use the [ ] notation to access pieces of a data frame, but now we need to give it both the row and the column number, separated by a ,:

students[6, 3]
## [1] "Male"

As before we can pick more than one piece:

students[1:5, 3]
## [1] "Male" "Male" "Male" "Male" "Female"
students[1:5, 1:2]
## Age GPA ## 1 22 3.1 ## 2 23 3.2 ## 3 20 2.1 ## 4 22 2.1 ## 5 21 2.3
students[-c(1:5), 3]
## [1] "Male" "Female" "Male" "Female" "Female"
students[1, ]
## Age GPA Gender ## 1 22 3.1 Male
students[, 2]
## [1] 3.1 3.2 2.1 2.1 2.3 2.9 2.3 3.9 2.6 3.2
students[, -3]
## Age GPA ## 1 22 3.1 ## 2 23 3.2 ## 3 20 2.1 ## 4 22 2.1 ## 5 21 2.3 ## 6 21 2.9 ## 7 18 2.3 ## 8 22 3.9 ## 9 21 2.6 ## 10 18 3.2

Vector Arithmetic

R allows us to apply any mathematical functions to a whole vector:

x <- 1:10 2*x
## [1] 2 4 6 8 10 12 14 16 18 20
x^2
## [1] 1 4 9 16 25 36 49 64 81 100
log(x)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 ## [8] 2.0794415 2.1972246 2.3025851
sum(x)
## [1] 55
y <- 21:30
x+y
## [1] 22 24 26 28 30 32 34 36 38 40
x^2+y^2 
## [1] 442 488 538 592 650 712 778 848 922 1000
mean(x+y) 
## [1] 31

Subsetting

One of the most common tasks in Statistic is to select a part of a data set for further analysis. There is even a name for this: data wrangling.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2014 Nancy France September 15 19 2014 Proceedings Part I Lnai 8724

Authors: Toon Calders ,Floriana Esposito ,Eyke Hullermeier ,Rosa Meo

2014th Edition

3662448475, 978-3662448472

More Books

Students also viewed these Databases questions

Question

6. Have you used solid reasoning in your argument?

Answered: 1 week ago