Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 02, 2024

The above shows the output after applying a linear regression model into this dataset : https://drive.google.com/file/d/1XqsEnwsLQ7QN1Q4aD30DIJePmiljr1tL/view?usp=share_link Using R programming , i am required to perform

image text in transcribed

The above shows the output after applying a linear regression model into this dataset : https://drive.google.com/file/d/1XqsEnwsLQ7QN1Q4aD30DIJePmiljr1tL/view?usp=share_link

Using R programming, i am required to perform exploratory data analysis and pre-process the data. It's also required to define the goal that can be applied to the dataset and perform some data pre-processing, e.g., perform conversion to ensure the variable is in the desired type, treating missing values, or remove irrelevant variables etc.

Plus, it is also required to : Using R programming, depending on the goal of the project that have been defined, please build two machine learning models to apply to the data. Choose two out of the following four machine learning models (i.e. clustering, classification, regression, or association rules analysis)

The dataset: https://drive.google.com/file/d/1XqsEnwsLQ7QN1Q4aD30DIJePmiljr1tL/view?usp=share_link

Retrieved from: https://data.world/data-society/air-traffic-passenger-data

Then, i used the following coding then i get the output as the above picture. But then, it seems like the model is not a good fit to the dataset. The model summary shows that the coefficient of determination (R-squared) is 0.005733, which is a measure of how well the model fits the data. This value indicates that the model is not a good fit for the data. Is the model doesn't fit well with the data? What can be done to fix this problem? Is there a need of changing to other type of machine learning model? Please investigate more on the model and the dataset! And the variables that are needed to be considered. I did tried by transforming the variable Passenger into log Passenger but there's an error with the code as follows:

Original coding:

# Load data

data

# Inspect data

str(data)

head(data)

# Check for missing values

sum(is.na(data))

# Summary statistics

summary(data)

# Visualize data

library(ggplot2)

ggplot(data, aes(x = Month, y = Passengers, color = Year)) +

geom_line() +

ggtitle("Air Traffic Passenger Data")

# Splitting data into training and testing sets

library(caTools)

set.seed(123)

split

train_data

test_data

# Model 1: Linear Regression

library(lm)

model1

summary(model1)

Coding taking the log transformation:

# Load data

data

# Create log transformation of 'Passengers' column

data$log_passengers

# Split data into training and testing sets

split

train_data

test_data

# Fit linear regression model using transformed data

model2

summary(model2)

#Checking the residuals

residuals

qqnorm(residuals)

qqline(residuals)

#Checking the normality of residuals

shapiro.test(residuals)

# Compare model performance using test data

prediction1

prediction2

Error encounter using this code:

Please fix this problem and i use the following coding so please fix this code using R programming! Or should i use other machine learning model? if I should, then please provide the code as well. I need this ASAP! thank you

> \# Fit linear regression model using transformed data > mode 12