Question
Anyone who can explain how to use the Resma 3 program? I have a quiz on the moodle platform that has the following information: Case
Anyone who can explain how to use the Resma 3 program? I have a quiz on the moodle platform that has the following information: Case Study: UPR Admissions
Graduation of UPR Students:
Counts Percentages
Yes ________ _________% |
No ________ _________% |
Case Study: UPR Admissions
consider the upr data set . This is the application data for all the students who applied and were accepted to UPR-Mayaguez between 2003 and 2013.
dim(upr)
## [1] 23666 16
tells us that there were 23666 applications and that for each student there are 16 pieces of information.
colnames(upr)
## [1] "ID.Code" "Year" "Gender" "Program.Code" ## [5] "Highschool.GPA" "Aptitud.Verbal" "Aptitud.Matem" "Aprov.Ingles" ## [9] "Aprov.Matem" "Aprov.Espanol" "IGS" "Freshmen.GPA" ## [13] "Graduated" "Year.Grad." "Grad..GPA" "Class.Facultad"
shows us the variables
head(upr, 3)
## ID.Code Year Gender Program.Code Highschool.GPA Aptitud.Verbal ## 1 00C2B4EF77 2005 M 502 3.97 647 ## 2 00D66CF1BF 2003 M 502 3.80 597 ## 3 00AB6118EB 2004 M 1203 4.00 567 ## Aptitud.Matem Aprov.Ingles Aprov.Matem Aprov.Espanol IGS Freshmen.GPA ## 1 621 626 672 551 342 3.67 ## 2 726 618 718 575 343 2.75 ## 3 691 424 616 609 342 3.62 ## Graduated Year.Grad. Grad..GPA Class.Facultad ## 1 Si 2012 3.33 INGE ## 2 No NA NA INGE ## 3 No NA NA CIENCIAS
shows us the first three cases.
Lets say we want to find the number of males and females. We can use the table command for that:
table(Gender)
## Error: object 'Gender' not found
What happened? Right now R does not know what Gender is because it is hidden inside the upr data set. Think of upr as a box that is currently closed, so R cant look inside and see the column names. We need to open the box first:
attach(upr) table(Gender)
## Gender ## F M ## 11487 12179
Note: you need to attach a data frame only once in each session working with R.
Note: Say you are working first with a data set students 2016 which has a column called Gender, and you attached it. Later (but in the same R session) you start working with a data set students 2017 which also has a column called Gender, and you are attaching this one as well. If you use Gender now it will be from students 2017.
Note when the data was transferred from moodle with get.moodle.data() it is automatically attached.
Subsetting of Data Frames
Consider the following data frame (not a real data set):
students
## Age GPA Gender ## 1 22 3.1 Male ## 2 23 3.2 Male ## 3 20 2.1 Male ## 4 22 2.1 Male ## 5 21 2.3 Female ## 6 21 2.9 Male ## 7 18 2.3 Female ## 8 22 3.9 Male ## 9 21 2.6 Female ## 10 18 3.2 Female
Here each single piece of data is identified by its row number and its column number. So for example in row 2, column 2 we have 3.2, in row 6, column 3 we have Male.
As with the vectors before we can use the [ ] notation to access pieces of a data frame, but now we need to give it both the row and the column number, separated by a ,:
students[6, 3]
## [1] "Male"
As before we can pick more than one piece:
students[1:5, 3]
## [1] "Male" "Male" "Male" "Male" "Female"
students[1:5, 1:2]
## Age GPA ## 1 22 3.1 ## 2 23 3.2 ## 3 20 2.1 ## 4 22 2.1 ## 5 21 2.3
students[-c(1:5), 3]
## [1] "Male" "Female" "Male" "Female" "Female"
students[1, ]
## Age GPA Gender ## 1 22 3.1 Male
students[, 2]
## [1] 3.1 3.2 2.1 2.1 2.3 2.9 2.3 3.9 2.6 3.2
students[, -3]
## Age GPA ## 1 22 3.1 ## 2 23 3.2 ## 3 20 2.1 ## 4 22 2.1 ## 5 21 2.3 ## 6 21 2.9 ## 7 18 2.3 ## 8 22 3.9 ## 9 21 2.6 ## 10 18 3.2
Vector Arithmetic
R allows us to apply any mathematical functions to a whole vector:
x <- 1:10 2*x
## [1] 2 4 6 8 10 12 14 16 18 20
x^2
## [1] 1 4 9 16 25 36 49 64 81 100
log(x)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 ## [8] 2.0794415 2.1972246 2.3025851
sum(x)
## [1] 55
y <- 21:30
x+y
## [1] 22 24 26 28 30 32 34 36 38 40
x^2+y^2
## [1] 442 488 538 592 650 712 778 848 922 1000
mean(x+y)
## [1] 31
Subsetting
One of the most common tasks in Statistic is to select a part of a data set for further analysis. There is even a name for this: data wrangling.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started