2. 5. Use scatter plots to find potential variables to have nonlinear relationship with price. Create the square of rooms, the square of beds, and the square of bathrooms. If necessary, create some squared variables or logarithmic variables to analyze the potential nonlinear relationships. Clustering analysis on reviews of property units 1) Find the hierarchical and non-hierarchical clustering models for the three variables related to the reviews (nu mber_of_reviews, review_scores_rating, and reviews_per_month). 2) Since the cluster variable is a categorical variables, you need to create a dummy variable to each cluster to be used in a regression model. Create appropriate dummy variables, and estimate the following regression model with the cluster dummy variables you created. proc reg data=Airbnb2 ; model pricepernight = accommodates wour cluster dun-my variables > ; run ; 3) According to the regression model in 2) which cluster groupis) is(are) signicant to the price per night? 4) Suppose you are a marketing manager at Airbnb, which groupicluster) might be a targeted group for the highest or lowest price of the rents? From the models in 3, we want to consider if the hostclass influences the price along with the accommodates. Estimate the appropriate model and explain if the hostclass is a significant variable to the price. Machine Learning using Regression Analysis: Let's consider to create regression models using training data set and save the estimated models and predict the prices using the rest of testing data. (Use the example we covered in the ppt slid es). 1) Split the Airbnb2 data to 70% as training data and 30% as testing (validating) data with a seed number as 123456. Estimate regression models as the dependent variable of PricePerNight using only the training data with the following options. 1. Adjusted R square 2. Stepwise 3. Your own model different from 1) and 2) Perform the out of sample prediction for the observations using only the testing data. Find the following statistics and compare the results. Which model is the best in terms of the following statistics? 1. MSE (mean square error) 2. RMSE (root mean square error) 3. MPE (mean percentage error) 4. MAE (mean absolute error)