Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem: Predicting Airfares on New Routes Several new airports have opened in major cities, providing an opportunity for airlines to offer flights on new routes

Problem: Predicting Airfares on New Routes
Several new airports have opened in major cities, providing an opportunity for airlines to offer flights on new routes connecting these cities. A major airline has collected data on 638 existing U.S. air travel routes to help determine pricing strategies for the new routes.
Relevant route attributes provided in the dataset include distance, demographics of the destination city, and whether the city is a popular vacation spot. However, other factors important for pricing, such as expected travel demand, remain unknown. Most critically, it is unclear whether budget airlines like Southwest will compete on these new routes.
Southwest utilizes a very different business model compared to legacy carriers, focused on price-sensitive leisure travelers through tactics like point-to-point flights between major cities, secondary airport hubs, a standardized fleet to reduce costs, as well as low fares. Consequently, if Southwest enters a market, average prices tend to drop dramatically.
Using the 638-route dataset as a benchmark, you will develop a model to predict estimated fares for the new routes based on distance, destination city attributes, and likely competition from budget airlines. This will allow the major airline to anticipate the impact of Southwest or other discount carrier entry on the new market opportunity.
The file CSV_Airfares.csv contains real data that were collected for the third quarter of a year. They consist of the following predictors and responses (i.e., the target variable):
S_CODE
Starting airport's code
S_CITY
Starting city
E_CODE
Ending the airport's code
E_CITY
Ending city
COUPON
The average number of coupons (a one-coupon flight is a non-stop flight, a two-coupon flight is a one-stop flight, etc.) for that route
NEW
Number of new carriers entering that route between Q3-96 and Q2-97
VACATION
Whether a vacation route (Yes) or not (No). Florida and Las Vegas routes are generally considered vacation routes.
SW
Whether Southwest Airlines serves that route (Yes) or not (No)
HI
Herfindel Index - the measure of market concentration
S_INCOME
Starting city's average personal income
E_INCOME
Ending city's average personal income
S_POP
Starting city's population
E_POP
Ending city's population
SLOT
Whether either endpoint airport is slot-controlled or not; this is a measure of airport congestion
GATE
Whether either endpoint airport has gate constraints or not; this is another measure of airport congestion
DISTANCE
Distance between two endpoint airports in miles
PAX
Number of passengers on that route during the period of data collection
FARE
(the response) The average fare on that route
Note that some cities are served by more than one airport, and in those cases the airports are
distinguished by their 3-letter code.
For this homework, the categorical variables Vacation, SW, Slot, Gate have been transformed into the following dummy variables:
VACATION_YES: =1 if Vacation is YES; and =0 otherwise;
VACATION_NO: =1 if Vacation is NO; and =0 otherwise;
SW_YES: =1 if SW is YES; and =0 otherwise;
SW_NO: =1 if SW is NO; and =0 otherwise;
SLOT_FREE: =1 if Slot is FREE; and =0 otherwise;
SLOT_CTRL: =1 if Slot is CONTROLLED; and =0 otherwise;
GATE_FREE: =1 if Gate is FREE; and =0 otherwise;
GATE_CONS: =1 if Gate is CONSTRAINED; and =0 otherwise.
Complete the following tasks (write necessary R code):
Partition the original dataset into training (60%) and validation sets (40%). The model will be fit to the training data and evaluated on the validation set.
Build a multiple linear regression model for predicting the average fare on a new route. Include all numerical predictors in the regression. For the four categorical variables (i.e., Vacation, SW, Slot, Gate), do NOT use the original variables. Instead, use the four dummy variables: VACATION_YES, SW_YES, SLOT_CTRL, and GATE_CONS. Finally, do not use S_CODE, S_CITY, E_CODE, and E_CITY in the regression.
Report the model estimation results. Based on the estimated values in the results, write out what the linear regression model looks like. Note: you need to put the estimated coefficient values in the square brackets.
e.g., FARE =[Intercept value]+[coefficient 1]*SW_Yes +[coefficient 2]*DISTANCE+...
Also, interpret the meanings of the two model coefficients for SW_Yes and DISTANCE.
Provide the corresponding estimation results to support your answer.
Use a Backward variable selection to reduce the number of predictors. How many variables are being selected? Report all the variables selected. Provide the estimation results.
Compare the predictive accuracy of the full model in (c) and the Backward model in (d). Focus on measures such as RMSE and Adjusted R2. Which model performs better, and why?
What suggestions/ins

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions