In late 2013, the taxi company Yourcabs.com in Bangalore, India, was facing a problem with the drivers
Question:
In late 2013, the taxi company Yourcabs.com in Bangalore, India, was facing a problem with the drivers using their platform—not all drivers were showing up for their scheduled calls. Drivers would cancel their acceptance of a call, and, if the cancellation did not occur with adequate notice, the customer would be delayed or even left high and dry.
Bangalore is a key tech center in India, and technology was transforming the taxi industry. Yourcabs.com featured an online booking system (though customers could phone in as well) and presented itself as a taxi booking portal.
The Uber ride sharing service started its Bangalore operations in mid-2014.
Yourcabs.com had collected data on its bookings from 2011 to 2013 and posted a contest on Kaggle, in coordination with the Indian School of Business, to see what it could learn about the problem of cab cancellations.
The data presented for this case are a randomly selected subset of the original data, with 10,000 rows, one row for each booking. There are 17 inputattributes, including user (customer) ID, vehicle model, whether the booking was made online or via a mobile app, type of travel, type of booking package, geographic information, and the date and time of the scheduled trip. The target attribute of interest is the binary indicator of whether a ride was canceled. The overall cancellation rate is between 7% and 8%.
Assignment 1. How can a predictive model based on these data be used by Yourcabs.com?
2. How can a profiling model (identifying predictors that distinguish canceled/uncanceled trips) be used by Yourcabs.com?
3. Explore, prepare, and transform the data to facilitate predictive modeling.
Here are some hints:
• In exploratory modeling, it is useful to move fairly soon to at least an initial model without solving all data preparation issues. One example is the GPS information—other geographic information is available So you could defer the challenge of how to interpret/use the GPS information.
• How will you dealwith missing data, such as cases where ? is indicated?
• Think about what useful information might be held within the date and time fields (the booking timestamp and the trip timestamp).
• Think also about the categoricalattributes and how to deal with them.
Should we turn them all into dummies? Use only some?
4. Fit several predictive models of your choice. Do they provide information on how the predictor attributes relate to cancellations?
5. Report the predictive performance of your model in terms of error rates (the confusion matrix). How well does the model perform? Can the model be used in practice?
6. Examine the predictive performance of your model in terms of ranking (lift). How well does the model perform? Can the model be used in practice?
Step by Step Answer:
Machine Learning For Business Analytics
ISBN: 9781119828792
1st Edition
Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel