Question

1 Approved Answer

Posted on Sep 25, 2024

### Problem 1: Vector boolean operations (1.5 points) ##### R has two kinds of Boolean operators implemented, single (`&`, `|`) and double (`&&`, `||`). One

### Problem 1: Vector boolean operations (1.5 points)

##### R has two kinds of Boolean operators implemented, single (`&`, `|`) and double (`&&`, `||`).

One of these operators takes advantage of something called *lazy evaluation* while the other one does not. They don't behave the same way when being applied to *vectors*.

To help you get started, try the following two examples:

```{r, eval = FALSE} # Example: The variable y.prob2a has not been defined yet. # (Do not define it!) x.prob2a <- 5 (x.prob2a < 10) | (y.prob2a > 2) (x.prob2a < 10) || (y.prob2a > 2) ```

What happened when you ran the above code chunk? (0.5 point)

```{r} # edit me ```

```{r, eval = FALSE} # Define vectors x.prob2a.vec <- c(TRUE, FALSE, FALSE) y.prob2a.vec <- c(TRUE, TRUE, FALSE)

# Apply various Boolean operations to see the output x.prob2a.vec & y.prob2a.vec x.prob2a.vec && y.prob2a.vec x.prob2a.vec | y.prob2a.vec x.prob2a.vec || y.prob2a.vec ```

Explain the difference between "&" and "|". (0.5 point)

```{r} # edit me ```

Explain the difference between "&" and "&&". (0.5 point)

```{r} # edit me ```

### Problem 2: Indego Bike Share Service (3.5 points + 0.5 bonus point)

All of the questions below should be answered by the data file Indego_trips_Q4_2016.csv, which has been posted on BBLearn. This file has been edited from the data provided by Indego at https://www.rideindego.com/about/data/

Indego is a bike share service in Philadelphia. This data file contains every rental record from every station in the fourth quarter of 2016.

(a) (0.5 point) Importe the file `Indego_trips_Q4_2016.csv` and name it `Indego.df`.

```{r} # Edit me ```

(b) (0.5 point) Get summary information of this data frame. What kind of information can you get from this file? How many trips are there in total? How many of these trips are one-way trips? How many of them are round-way trips?

```{r} # Edit me ```

(c) (0.5 point) How many different types of users are there? (Hint: Check passholder_type) What is the percentage of the users that have Indego30 pass? What is the percentage of the users that have IndegoFlex pass? What is the percentage of the users that have Walk-up customers?

```{r} # Edit me ```

(d) (0.5 point) Look at trip #5. How long did this trip last? (The unit of the variable "duration" is `second`.) For trip #5, what was the ID of its departing (starting) station? For trip #5, What was the ID of its arriving (ending) station?

```{r} # Edit me ```

(e) (0.5 point) Take a look at the summary information of this data frame. The summary() function returns the mins, means, etc of the ID numbers of the starting stations and the ID numbers of the ending stations.

However, mean ID number is not a meaningful quantity (why?). ```{r} # edit me ```

Please convert the ID vectors into factors. Then, use the summary() function again. Find the top 5 most frequently used starting stations and ending stations.

```{r} # Edit me ```

(f) (0.5 point) Use 'help' to learn the plot() function. Try to plot latitude vs. longitude using the starting point for each trip (start_lat and start_lon).

```{r} # Edit me ```

(g) (0.5 point) Create a new data frame called `indegomktg.df` that includes the trip durations in hours and the passholder type. Output its summary information.

```{r} # Edit me ```

(h) (Extra credit: 0.5 point) Create a vector called `shortpass` that includes the passholder types for all the trips that lasted less than 1 hour. What is the percentage of these short trips that were completed by Indego30 pass holders? ```{r} # Edit me ```