Question
Questions 1- 3 Introduction The New York City Taxi and Limousine Commission (TLC) records data for the yellow and green taxis in New York City.
Questions 1- 3 Introduction The New York City Taxi and Limousine Commission (TLC) records data for the yellow and green taxis in New York City. This includes information such as pick and drop o times, fares, distance travelled, payment type, tip, etc. A description of the columns provided to you for this exam is listed in Table 1.
The data used in this assignment is a random sample of 100,000 yellow taxi trips from December, 2018.
I did some initial checks on the data; removing nonsense values and making the data easier to work with. I would recommend that you perform your own quality checks on the data prior to beginning any analysis. This could include checking summaries, plotting, and any other exploratory data analysis that you deem necessary (this is part of Q1a).
The original dataset is available from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.
Field Name Description
Vendor Taxicab Technology System that provided the record
RateCode The nal rate code at the end of the trip
PUborough The Borough where the pick up was made
DOborough The Borough where the drop o was made
passenger_count The number of passengers in the vehicle
trip distance The elapsed trip distance in miles
TimeOfDay Pickup occurred during the Morning (6am-12noon); Afternoon (12noon- 6pm); Evening (6pm-12midnight); Night (12midnight-6am)
trip duration The length of the trip in seconds
paymentMethod How the passenger paid for the trip
fare_amount The time-and-distance fare calculated by the meter
extra Miscellaneous extras and surcharges. Currently only includes $0.50 and $1 rush hour and overnight charges.
mta_tax $0.50 MTA tax that is automatically triggered based on the metered rate in use
tip_amount Tip amount - this eld is automatically populated for credit card tips. Cash tips are not included.
tolls_amount Total amount of all tolls paid in trip
improvement-surcharge $0.30 improvement surcharge assessed trips at the ag drop
total_amount The total amount charged to passengers. Does not include cash tips Table 1: Data description for Questions 1-3.
Question 1 Required le: nyc-taxi.csv For (b), (c), and (d) below: (i) State the test you will use; (ii) State the null and alternative hypotheses - in symbols and in plain language; (iii) State and check the assumptions for each test; (iv) Perform the test (even if the assumptions are violated); (v) Write a conclusion for the test performed (do you reject or fail to reject the null hypothesis?); (vi) Provide a plain language interpretation of the test's conclusion;
(a) Perform some exploratory data analysis. Does the data seem okay? Is there anything odd or interesting? Does it makes sense to remove any of the data (provide an explanation of why or why not)? Provide 2 plots at most. (b) Is there a dierence between the average morning and afternoon trip durations? Calculate the 95% condence interval for the dierence and interpret this interval. (c) Is there a dierence in the average trip duration between the morning, afternoon, evening, and night? (d) Is there an interaction between PUborough and TimeOfDay when considering average trip duration? If "yes", which combination of PUborough and time of day would a passenger expect the longest trip duration? (Only one (1) PUborough TimeOfDay combination should be reported.) (e) What time of day should a passenger expect to spend the most time in a taxi on average? Question 2 Required le: nyc-taxi.csv You are an enterprising taxi driver and would like to maximize your tips. In order to understand the situation where you can expect the largest average tip you t a model using the data provided. (a) Choose and briey describe a modelling approach from class (e.g., linear, stepwise, Ridge, Lasso, or logistic regression); (b) State and check the assumptions of this model; (c) Fit the model (even if the assumptions are violated); (d) Provide the estimated model along with a plain language interpretation; (e) Is this a good model? Provide justication for your answer. (f) How would you maximize your tips?
Question 3 Required le: nyc-taxi.csv Come up with a question that you could answer using this dataset. (a) State the question. (b) State which variables (columns) you think would be needed. It might make sense to formulate a scientic style hypothesis here as you are likely guessing which variables would be important. (c) Determine what the statistical hypotheses would be (if applicable). (d) What statistical analysis (modelling or test) would you use?
I am unable to attach the dataset .csv file. If you could contact me through mail. I will send the file. With that file only all these questions have to be answered.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started