Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem: Predicting Airfares on New Routes Several new airports have opened in major cities, providing an opportunity for airlines to offer flights on new routes
Problem: Predicting Airfares on New Routes
Several new airports have opened in major cities, providing an opportunity for airlines to offer flights on new routes connecting these cities. A major airline has collected data on existing US air travel routes to help determine pricing strategies for the new routes.
Relevant route attributes provided in the dataset include distance, demographics of the destination city, and whether the city is a popular vacation spot. However, other factors important for pricing, such as expected travel demand, remain unknown. Most critically, it is unclear whether budget airlines like Southwest will compete on these new routes.
Southwest utilizes a very different business model compared to legacy carriers, focused on pricesensitive leisure travelers through tactics like pointtopoint flights between major cities, secondary airport hubs, a standardized fleet to reduce costs, as well as low fares. Consequently, if Southwest enters a market, average prices tend to drop dramatically.
Using the route dataset as a benchmark, you will develop a model to predict estimated fares for the new routes based on distance, destination city attributes, and likely competition from budget airlines. This will allow the major airline to anticipate the impact of Southwest or other discount carrier entry on the new market opportunity.
The file CSVAirfares.csv contains real data that were collected for the third quarter of a year. They consist of the following predictors and responses ie the target variable:
SCODE
Starting airport's code
SCITY
Starting city
ECODE
Ending the airport's code
ECITY
Ending city
COUPON
The average number of coupons a onecoupon flight is a nonstop flight, a twocoupon flight is a onestop flight, etc. for that route
NEW
Number of new carriers entering that route between Q and Q
VACATION
Whether a vacation route Yes or not No Florida and Las Vegas routes are generally considered vacation routes.
SW
Whether Southwest Airlines serves that route Yes or not No
HI
Herfindel Index the measure of market concentration
SINCOME
Starting city's average personal income
EINCOME
Ending city's average personal income
SPOP
Starting city's population
EPOP
Ending city's population
SLOT
Whether either endpoint airport is slotcontrolled or not; this is a measure of airport congestion
GATE
Whether either endpoint airport has gate constraints or not; this is another measure of airport congestion
DISTANCE
Distance between two endpoint airports in miles
PAX
Number of passengers on that route during the period of data collection
FARE
the response The average fare on that route
Note that some cities are served by more than one airport, and in those cases the airports are
distinguished by their letter code.
For this homework, the categorical variables Vacation, SW Slot, Gate have been transformed into the following dummy variables:
VACATIONYES: if Vacation is YES; and otherwise;
VACATIONNO: if Vacation is NO; and otherwise;
SWYES: if SW is YES; and otherwise;
SWNO: if SW is NO; and otherwise;
SLOTFREE: if Slot is FREE; and otherwise;
SLOTCTRL: if Slot is CONTROLLED; and otherwise;
GATEFREE: if Gate is FREE; and otherwise;
GATECONS: if Gate is CONSTRAINED; and otherwise.
Complete the following tasks write necessary R code:
Partition the original dataset into training and validation sets The model will be fit to the training data and evaluated on the validation set.
Build a multiple linear regression model for predicting the average fare on a new route. Include all numerical predictors in the regression. For the four categorical variables ie Vacation, SW Slot, Gate do NOT use the original variables. Instead, use the four dummy variables: VACATIONYES, SWYES, SLOTCTRL and GATECONS. Finally, do not use SCODE, SCITY, ECODE, and ECITY in the regression.
Report the model estimation results. Based on the estimated values in the results, write out what the linear regression model looks like. Note: you need to put the estimated coefficient values in the square brackets.
eg FARE Intercept valuecoefficient SWYes coefficient DISTANCE
Also, interpret the meanings of the two model coefficients for SWYes and DISTANCE.
Provide the corresponding estimation results to support your answer.
Use a Backward variable selection to reduce the number of predictors. How many variables are being selected? Report all the variables selected. Provide the estimation results.
Compare the predictive accuracy of the full model in c and the Backward model in d Focus on measures such as RMSE and Adjusted R Which model performs better, and why?
What suggestionsins
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started