Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 26, 2024

don't have direct access to external files or the internet. I can provide you with the code and guidance on how to perform the tasks

don't have direct access to external files or the internet. I can provide you with the code and guidance on how to perform the tasks you mentioned using Python and common libraries. You'll need to download the "Real estate valuation data set.xlsx

"

file from the UCI repository and have it locally accessible.

Here's how you can accomplish the tasks you mentioned:

Task

1

: Linear Regression Model

First, let's load and split the dataset into training and testing sets. You can use the train

_

test

_

split function from scikit

-

learn to accomplish this:

import pandas as pd

from sklearn.model

_

selection import train

_

test

_

split

# Load the dataset

data

=

.

read

_

excel

("

Real estate valuation data set.xlsx

")

# Split the dataset into features and targets

=

data.iloc

[

,

- 1]

# Features

(

all columns except the last one

)

=

data.iloc

[

, - 1]

# Target

(

last column

)

# Split the data into train and test sets

_

train, X

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

,

,

test

_

size

= 0.2,

random

_

state

= 42)

Next, we'll train a linear regression model on the training set and evaluate its performance on the test set:

from sklearn.linear

_

model import LinearRegression

from sklearn.metrics import mean

_

squared

_

error, mean

_

absolute

_

error, r

2_

score

# Create a linear regression model

model

=

LinearRegression

()

# Train the model

model.fit

(

_

train, y

_

train

)

# Make predictions on the test set

_

pred

=

model.predict

(

_

test

)

# Calculate performance metrics

mse

=

mean

_

squared

_

error

(

_

test, y

_

pred

)

mae

=

mean

_

absolute

_

error

(

_

test, y

_

pred

)

2 =

2_

score

(

_

test, y

_

pred

)

# Print the performance metrics

("

Mean Squared Error:", mse

)

("

Mean Absolute Error:", mae

)

("

-

squared Score:", r

2)

Explanation:

This code will calculate and print the mean squared error

(

MSE

),

mean absolute error

(

MAE

),

and R

-

squared score for the linear regression model.

Step

2

Task

2

: Applying PCA on the Dataset

To apply PCA on the dataset and select the first three principal components, you can use the PCA class from scikit

-

learn:

from sklearn.decomposition import PCA

# Apply PCA on the dataset

pca

=

PCA

(

_

components

= 3)

_

pca

=

pca.fit

_

transform

(

)

# Split the PCA

-

transformed data into train and test sets

_

pca

_

train, X

_

pca

_

test,

_,_=

train

_

test

_

split

(

_

pca, y

,

test

_

size

= 0.2,

random

_

state

= 42)

Now, you have X

_

pca

_

train and X

_

pca

_

test as the PCA

-

transformed features, and you can proceed to train a linear regression model and evaluate its performance using the same code as in Task

1 .

Step

3

Task

3

: Logistic Regression Model with PCA on IRIS Dataset

To load the "IRIS" dataset from scikit

-

learn, apply PCA, split it into train and test sets, and train a logistic regression model, you can use the following code:

from sklearn.datasets import load

_

iris

from sklearn.linear

_

model import LogisticRegression

from sklearn.metrics import accuracy

_

score, precision

_

score, recall

_

score, f

1_

score

# Load the IRIS dataset

iris

=

load

_

iris

()

# Split the dataset into features and targets

=

iris.data

=

iris.target

# Apply PCA on the dataset

pca

=

PCA

(

_

components

= 3)

_

pca

=

pca.fit

_

transform

(

)

# Split the PCA

-

transformed data into train and test sets

_

pca

_

train, X

_

pca

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

_

pca, y

,

test

_

size

= 0.2,

random

_

state

= 42)

# Create a logistic regression model

model

=

LogisticRegression

()

# Train the model

model.fit

(

_

pca

_

train, y

_

train

)

# Make predictions on the test set

_

pred

=

model.predict

(

_

pca

_

test

)

# Calculate performance metrics

accuracy

=

accuracy

_

score

(

_

test, y

_

pred

)

precision

=

precision

_

score

(

_

test, y

_

pred, average

=

'weighted'

)

recall

=

recall

_

score

(

_

test, y

_

pred, average

=

'weighted'

)

1 =

1_

score

(

_

test, y

_

pred, average

=

'weighted'

)

# Print the performance metrics

("

Accuracy:

",

accuracy

)

("

Precision:

",

precision

)

("

Recall:

",

recall

)

("

1

Score:", f

1)

This code will calculate and print the accuracy, precision, recall, and F

1

score for the logistic regression model trained on the PCA

-

transformed IRIS dataset.

Step

4

Task

4

: Logistic Regression Model with Regularization

To apply L

1

or L

2

regularization to the logistic regression model using the same train and test data as in Task

3,

you can modify the logistic regression model creation code as follows:

# Create a logistic regression model with L

1

regularization

model

=

LogisticRegression

(

penalty

='

1',

solver

=

'saga'

)

# Create a logistic regression model with L

2

regularization

model

=

LogisticRegression

(

penalty

='

2')

After modifying the model creation code, you can proceed to train the model, make predictions, and calculate the performance metrics as before. Compare the performance of this regularized model with the performance reported in Task