Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Questions 1- 3 Introduction The New York City Taxi and Limousine Commission (TLC) records data for the yellow and green taxis in New York City.

Questions 1- 3 Introduction The New York City Taxi and Limousine Commission (TLC) records data for the yellow and green taxis in New York City. This includes information such as pick and drop o times, fares, distance travelled, payment type, tip, etc. A description of the columns provided to you for this exam is listed in Table 1.

The data used in this assignment is a random sample of 100,000 yellow taxi trips from December, 2018.

I did some initial checks on the data; removing nonsense values and making the data easier to work with. I would recommend that you perform your own quality checks on the data prior to beginning any analysis. This could include checking summaries, plotting, and any other exploratory data analysis that you deem necessary (this is part of Q1a).

The original dataset is available from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.

Field Name Description

Vendor Taxicab Technology System that provided the record

RateCode The nal rate code at the end of the trip

PUborough The Borough where the pick up was made

DOborough The Borough where the drop o was made

passenger_count The number of passengers in the vehicle

trip distance The elapsed trip distance in miles

TimeOfDay Pickup occurred during the Morning (6am-12noon); Afternoon (12noon- 6pm); Evening (6pm-12midnight); Night (12midnight-6am)

trip duration The length of the trip in seconds

paymentMethod How the passenger paid for the trip

fare_amount The time-and-distance fare calculated by the meter

extra Miscellaneous extras and surcharges. Currently only includes $0.50 and $1 rush hour and overnight charges.

mta_tax $0.50 MTA tax that is automatically triggered based on the metered rate in use

tip_amount Tip amount - this eld is automatically populated for credit card tips. Cash tips are not included.

tolls_amount Total amount of all tolls paid in trip

improvement-surcharge $0.30 improvement surcharge assessed trips at the ag drop

total_amount The total amount charged to passengers. Does not include cash tips Table 1: Data description for Questions 1-3.

Question 1 Required le: nyc-taxi.csv For (b), (c), and (d) below: (i) State the test you will use; (ii) State the null and alternative hypotheses - in symbols and in plain language; (iii) State and check the assumptions for each test; (iv) Perform the test (even if the assumptions are violated); (v) Write a conclusion for the test performed (do you reject or fail to reject the null hypothesis?); (vi) Provide a plain language interpretation of the test's conclusion;

(a) Perform some exploratory data analysis. Does the data seem okay? Is there anything odd or interesting? Does it makes sense to remove any of the data (provide an explanation of why or why not)? Provide 2 plots at most. (b) Is there a dierence between the average morning and afternoon trip durations? Calculate the 95% condence interval for the dierence and interpret this interval. (c) Is there a dierence in the average trip duration between the morning, afternoon, evening, and night? (d) Is there an interaction between PUborough and TimeOfDay when considering average trip duration? If "yes", which combination of PUborough and time of day would a passenger expect the longest trip duration? (Only one (1) PUborough TimeOfDay combination should be reported.) (e) What time of day should a passenger expect to spend the most time in a taxi on average? Question 2 Required le: nyc-taxi.csv You are an enterprising taxi driver and would like to maximize your tips. In order to understand the situation where you can expect the largest average tip you t a model using the data provided. (a) Choose and briey describe a modelling approach from class (e.g., linear, stepwise, Ridge, Lasso, or logistic regression); (b) State and check the assumptions of this model; (c) Fit the model (even if the assumptions are violated); (d) Provide the estimated model along with a plain language interpretation; (e) Is this a good model? Provide justication for your answer. (f) How would you maximize your tips?

Question 3 Required le: nyc-taxi.csv Come up with a question that you could answer using this dataset. (a) State the question. (b) State which variables (columns) you think would be needed. It might make sense to formulate a scientic style hypothesis here as you are likely guessing which variables would be important. (c) Determine what the statistical hypotheses would be (if applicable). (d) What statistical analysis (modelling or test) would you use?

I am unable to attach the dataset .csv file. If you could contact me through mail. I will send the file. With that file only all these questions have to be answered.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

College Algebra And Trigonometry A Unit Circle Approach,

Authors: Mark Dugopolski

5th Edition

0321908252, 9780321908254

More Books

Students also viewed these Mathematics questions