Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Assignment is related to the module, Regression. The questions that follow are based on a dataset of New York city taxi rides stored in taxi.csv.

Assignment is related to the module, Regression. The questions that follow are based on a dataset of New York city taxi rides stored in taxi.csv. Read in taxi.csv and assign it to an object, taxi. If the data is in your working directory, then you can use the following code to read in the data:

taxi = read.csv('taxi.csv')

About NYC Taxi Data 
This data contains a subset of NYC taxi trips for April 2022. The goal of this assignment is to examine the factors that influence the size of the tip (tip_amount) a taxi driver receives.

Variables

trip_id: Unique identifier for each trip

trip_duration: Duration of trip in minutes

trip_distance: Distance of trip in miles

passenger_count: Number of passengers

fare_amount: Fare calculated by the meter. This does not include tolls, surcharges or tips.

tolls_amount: Amount of all tolls paid in trip

tip: whether the taxi driver received a Tip or No Tip

tip_amount: tip paid

period_of_day: Time of day for pickup: morning, afternoon, evening, night

pickup_date: Date of month for pickup

period_of_month: Period of month when the trip occurred: beginning, middle, end

pickup_day: Day of week for trip: Mon, Tue, Wed, Thu, Fri, Sat, Sun

pickup_hour: Hour of day for pickup

pickup_min: Minute of day for pickup

pickup_sec: Second of day for pickup

pickup_time: Pick up date and time

dropoff_time: Drop off date and time

 

Details

You will have a maximum of three attempts for this assignment. Only those attempts registered before the due date will count towards your score.

When entering your answers, please follow these instructions unless otherwise stated. (Failing to do so may mark your answer as incorrect even if it is correct.):

Do not round answers from R. Enter them as is.

Do not use commas to separate numbers in an answer. E.g., write 100000 NOT 100,000

Do not include units. E.g., 34.56 NOT $34.56

Wherever relevant, include the 0 before the decimal. E.g., state the answer as 0.34 NOT .34

Drop trailing 0s after the decimal. For e.g., state answer as 0.3 NOT 0.30

Academic Integrity

The responses on this assignment must be the product of your individual work. Copying and presenting the work of another as your own, or collaborating with others on this assignment is an academic infarction punishable with a failing grade in this assignment, or this course.

Question 1 (2 points)

 

Generally speaking, including a larger number of meaningful predictors will improve the quality of predictions. It is reasonable to expect the following predictors to influence tip paid: number of passengers (passenger_count), fare amount (fare_amount), hour of the day of the ride (pickup_hour), whether the trip occurred in the beginning, middle or end of the month (period_of_month), and day of the week for the trip (pickup_day). Use these variables in a multiple regression to predict tip_amount. Call this model5.

Which of the following variables are significant predictors of tip_amount? Please note, a categorical predictor variable is statistically significant if even one of the dummy variables representing it is statistically significant. Select one or more correct answers.

Question 1 options:

 period_of_month
 

 

passenger_count

 

 

 

pickup_day

 

 

 

pickup_hour

 

 

fare_amount

Question 2 (2 points)

In model5, which is the strongest predictor of tip_amount?

Question 2 options:

 

 

passenger_count

 

 

 

pickup_day

 

 

 

pickup_hour

 

 

 

fare_amount

 

 

period_of_month

Question 3 (2 points)

What is the RMSE for model5?

Question 3 options:

Question 4 (2 points)

Now, let us explore non-linear relationships of fare_amount and pickup_hour by including polynomial terms. Modify model5 by replacing fare_amount with poly(fare_amount, 2) and pickup_hour with poly(pickup_hour, 2). Keep the rest of the model the same. Call this model6.

In model6, which of the following variables are significant predictors of tip_amount. Select one or more correct answers.

Question 4 options:

 

 

poly(pickup_hour,2)1

 

 period_of_month
 

 

passenger_count

 

 

 

poly(fare_amount,2)2

 

 

 

poly(pickup_hour,2)2

 

 

 

pickup_day

 

 

poly(fare_amount,2)1

Question 5 (2 points)

What is the RMSE for model6?

Question 5 options:

Question 6 (2 points)

Use the variables in model5, to fit a Generalized Additive Model using method="REML". Use smoothing functions for fare_amount [i.e.,s(fare_amount)] and pickup_hour [i.e., s(pickup_hour)]. Leave the other variables unchanged. Call this model7.

In model7, which of the following variables are significant predictors of tip_amount. Select one or more correct answers.

Question 6 options:

 

 

s(pickup_hour)

 

 

 

pickup_day

 

 

 

s(fare_amount)

 

 

 

passenger_count

 

 

period_of_month

Question 7 (2 points)

What is the RMSE for model7?

Question 7 options:

Question 8 (2 points)

The litmus test for model performance is how it performs on data that was not used to estimate it. Model5 is the multiple regression model with linear terms. Compute the RMSE for model5 on the test sample.

Question 8 options:

Question 9 (2 points)

GAM (model7) did better than the linear model (model5) on the train sample. Let us see if the flexible GAM model outperforms the linear model in the test sample. Compute the RMSE for model7 on the test sample.

Step by Step Solution

3.38 Rating (154 Votes )

There are 3 Steps involved in it

Step: 1

Questi... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Economics

Authors: R. Glenn Hubbard

6th edition

978-0134797731, 134797736, 978-0134106243

More Books

Students also viewed these Databases questions

Question

2. Respect rules and constraints in your own behavior.

Answered: 1 week ago

Question

Does a monopolist have a supply curve? Briefly explain.

Answered: 1 week ago