Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

need to use r for this. Question 1 Let's take a second look @ at where Citi bike stations are located. Visualise the following layered

need to use r for this.

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
Question 1 Let's take a second look @ at where Citi bike stations are located. Visualise the following layered bubble charts: 1. points of all start stations. Sizes vary with the total number of pickups. 2. points of all end stations. Sizes vary with the total number of returns. You should end up with a ggplot object named p1 , with alpha = 0.5 to both layers. HINTS 1. Check out the Iggplotz] reference page for a handy geom_*( ) to quickly get this done. Copy p1 40.80- 40.76 - 10000 start_station_latitude 20000 30000 40000 40.72 - 40.68 - -74.10 -74.05 -74.00 -73.95 start_station_longitudeQuestion 2 Find the most frequently used bike's data records. You should end up with a tibble called top_bike_trips Copy top_bike_trips #> # A tibble: 825 x 15 #> tripduration starttime stoptime #> #> 1 520 2018-01-03 13:06:21 2018-01-03 13:15:01 #> 2 232 2018-01-03 17:01:21 2018-01-03 17:05:14 #> 3 315 2018-01-14 15:08:14 2018-01-14 15:13:30 #> 4 266 2018-01-23 14:57:30 2018-01-23 15:01:57 #> 5 162 2018-01-24 17:01:10 2018-01-24 17:03:53 #> 6 150 2018-01-25 18:26:58 2018-01-25 18:29:29 #> 7 272 2018-01-03 08:49:11 2018-01-03 08:53:43 #> 8 315 2018-01-20 14:06:28 2018-01-20 14:11:44 #> 9 322 2018-01-02 15:43:42 2018-01-02 15:49:04 #> 10 251 2018-01-10 17:48:03 2018-01-10 17:52:14 #> # .. with 815 more rows, and 12 more variables: #> # start_station_id , start_station_name , #> # start_station_latitude , start_station_longitude , #> # end_station_id , end_station_name , #> # end_station_latitude , end_station_longitude , #> # bikeid , usertype , birth_year , gender Copy library ( lubridate) library (tidyverse) nycbikes18 # A tibble: 333, 687 x 15 #> tripduration starttime stoptime #> #> 1 932 2018-01-01 02:06:17 2018-01-01 02:21:50 #> 2 550 2018-01-01 12:06:18 2018-01-01 12:15:28 #> 3 510 2018-01-01 12:06:56 2018-01-01 12:15:27 #> 4 354 2018-01-01 14:53: 10 2018-01-01 14:59:05 #> 5 250 2018-01-01 17:34:30 2018-01-01 17:38:40 #> 6 613 2018-01-01 22:05:05 2018-01-01 22:15:19 #> 7 290 2018-01-02 12:13:51 2018-01-02 12:18:42 #> 381 2018-01-02 12:50:03 2018-01-02 12:56:24 #> 9 318 2018-01-02 13:55:58 2018-01-02 14:01:16 #> 10 1852 2018-01-02 16:55:29 2018-01-02 17:26:22 #> # .. with 333, 677 more rows, and 12 more variables: #> # start_station_id , start_station_name , #> # start_station_latitude , start_station_longitude , #> # end_station_id , end_station_name , #> # end_station_latitude , end_station_longitude , #> # bikeid , usertype , birth_year , gender Question 3 Plot all journeys that the most frequently used bike has travelled on. You should end up with a ggplot called p2 , with alpha = 0.5. Copy p2 40.75 - 40.74- 40.73- start_station_latitude 40.72- 40.71 - -74.08 -74.07 -74.06 -74.05 -74.04 -74.0: start_station_longitudeQuestion 4 In order to study different riding behaviours by age groups, you'll transform nycbikes18 to a tibble called nycbikes18_age : 1. modify tripduration to be converted in minutes. 2. when birth_year is less than 1900, replace values with NA_real_. 3. add a new column age derived from user's birth year based on 2018. 4. add a new column age_group that bins age into groups: 0-14, 15-24, 25-44. 45-64, 65+. Copy glimpse(nycbikes18_age) #> ROWS: 333, 687 #> Columns: 17 #> $ tripduration 15.533333, 9. 166667, 8. 500000, 5.900. #> $ starttime 2018-01-01 02:06:17, 2018-01-01 12 :. #> $ stoptime 2018-01-01 02:21:50, 2018-01-01 12:. #> $ start_station_id 3183, 3183, 3183, 3183, 3183, 3183, . #> $ start_station_name "Exchange Place", "Exchange Place", . #> $ start_station_latitude 40. 71625, 40. 71625, 40. 71625, 40. 716. #> $ start_station_longitude -74. 03346, -74.03346, -74.03346, -74. #> $ end_station_id 3199, 3199, 3199, 3267, 3639, 3203, . #> $ end_station_name "Newport PKwy", "Newport PKwy", "New. #> $ end_station_latitude 40. 72874, 40. 72874, 40.72874, 40.712. #> $ end_station_longitude -74. 03211, -74.03211, -74. 03211, -74. #> $ bikeid 31929, 31845, 31708, 31697, 31861, 3. #> $ usertype "Subscriber", "Subscriber", "Subscrim.. #> $ birth_year 1992, 1969, 1946, 1994, 1991, 1982, . #> $ gender 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, . #> $ age 26, 49, 72, 24, 27, 36, 60, 29, 58, . #> $ age_group " (24, 44]", "(44, 64]", "65+", "(14, 24.Question 5 Generate a side-by-side boxplot to demonstrate the differences of trip durations across age groups coloured by user types. (NOTE: you need to modify labels of this plot.) You should end up with a ggplot called p3 , with . labeling "Age Group" to x axis . labeling "Trip in minutes (on log10)" to y axis Copy p3 10000- . . . . . .. . . 1000 usertype . Trip in minutes (on log 10) Customer 100 - . Subscriber 10 - 1 - (14,24] (24,44] (44,64] 65+ NA Age GroupAge Group Question 6 Plot a grouped bar chart that sums up the number of trips by months of startime and gender. Intermediate data for the plot looks like as follows: #> # A tibble: 33 x 3 #> month gender ntrips #> #> 1 Jan unknown 428 #> 2 Jan male 9798 #> 3 Jan female 2451 #> 4 Feb unknown 498 #> 5 Feb male 11849 #> 6 Feb female 2757 #> 7 Mar unknown 794 #> 8 Mar male 13231 #> 9 Mar female 3084 #> 10 Apr unknown 1676 #> # .. with 23 more rows You should end up with a ggplot called p4 .\fQuestion 7 Produce a line graph depicting the 3rd quantile of trip durations by months of starttime and age groups. (NOTE: check out the legend order that matches to the last value of each line.) Intermediate data for the plot looks like as follows: #> # A tibble: 55 x 3 #> month age_group qtl_tripd #> #> 1 Jan (14, 24] 9. 18 #> 2 Jan (24, 44] 7.37 #> 3 Jan (44,64] 8. 62 #> 4 Jan 65+ 9.57 #> 5 Jan 6. 69 #> 6 Feb (14,24] 9. 76 #> 7 Feb (24,44] 7.23 #> 8 Feb (44, 64] 8. 38 #> 9 Feb 65+ 10.6 #> 10 Feb 7. 08 #> # .. with 45 more rows You should end up with a ggplot called p5 . HINTSHINTS Copy p5 15.0- 12.5- age_group 65+ (14,24] qtl_tripd 10.0 - - (44,64] (24,44] NA 7.5 - 5.0- Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov monthQuestion 8 Present the following pivot table that counts the number of trips by upper-tail user's types and age groups. Upper-tail users are defined as the ones who ride for longer periods than 90% of users of the same age group. (NOTE: the column headers of user_behaviours ) You should end up with a tibble called user_behaviours . HINTS Copy user_behaviours #> # A tibble: 5 x 3 #> `Age Group Customer Subscriber #> #> 1 (14,24] 641 706 #> 2 (24,44] 3749 19864 #> 3 (44,64] 4760 3275 #> 4 65+ 25 289 #> 5 NA 13Question 9 You're going to get nycbikes18 prepared in a form for the final question: 1. modify starttime down to the nearest hour. 2. aggregate to hourly number of trips (denoted as ntrips ) by user types. 3. add new columns startdate, starthour , and startwday that contain dates, hours of the day, and weekdays respectively extracted from starttime. You should end up with a tibble called hourly_ntrips Copy hourly_ntrips #> # A tibble: 11, 739 x 6 #> starttime usertype ntrips startdate starthour startwday #> #> 1 2018-01-01 00:00:00 Subscribe. 1 2018-01-01 0 Mon #> 2 2018-01-01 01:00:00 Subscribe. 3 2018-01-01 1 Mon #> 3 2018-01-01 02:00:00 Subscribe. 3 2018-01-01 2 Mon #> 4 2018-01-01 03:00:00 Subscribe. 7 2018-01-01 3 Mon #> 5 2018-01-01 04:00:00 Subscribe. 1 2018-01-01 4 Mon #> 6 2018-01-01 06:00:00 Subscribe. 2 2018-01-01 6 Mon #> 7 2018-01-01 08:00:00 Subscribe. 5 2018-01-01 8 Mon #> 8 2018-01-01 09:00:00 Subscribe. 2 2018-01-01 9 Mon #> 9 2018-01-01 10:00:00 Subscribe. 4 2018-01-01 10 Mon #> 10 2018-01-01 11:00:00 Subscribe. 3 2018-01-01 11 Mon #> # .. with 11, 729 more rowsQuestion 10 Let's look at user behaviours between Subscriber and Customer in temporal context. Visualise the following layered graphics, faceted by weekdays and user types: 1. grey lines indicating the hourly number of trips against time of the day from hourly_ntrips. with colour = "#bdbdbd" and alpha = 0.5. 2. superimposed lines indicating the average number of trips by hours of the day, weekdays, and user types, with size = 1. You should end up with a ggplot object named p6 with a black-and-white theme. HINTS Copy P6 Mon Tue Wed Thu Fri Sat Sun 40 Customer startwday 20 - Mon Tue Wed ntrips Thu Fri 200 - Sat Sun Subscriber 0 5 101520 0 5 101520 0 5 101520 0 5 101520 0 5 101520 0 5 101520 0 5 101520 starthour

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Students also viewed these Mathematics questions