Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I do not have access to Python or excel/office Covid sick cant access school computer last time getting error please help No matter how much

I do not have access to Python or excel/office

Covid sick cant access school computer last time getting error please help

No matter how much I access and cleanData1 I always get an error on every code

image text in transcribed

--------------------------------------------------------------------------------------------

Creating a DataFrame With Pandas

We can create a Pandas DataFrame using the DataFrame() class.

Code run:

import pandas as pd

import numpy as np

data = {

"calories": [504, 380, 396],

"duration": [60, 40, 45]

}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

image text in transcribed

Cleaning Rows With NaNs

  1. On your computer under Documents folder, create a new folder called CEIS310.
  2. Open Microsoft Excel and create a file named preDataset1.xlsx with the following content in CEIS310 folder.
A B C
1 2 3
4 NaN 6
7 8 9
10 NaN 12
13 14 15
16 17 18
  1. Open this saved Excel file (preDataset1.xlsx) and now save it as preDataset1.csv.

Visually we can spot that there are a few rows with NaN (empty) fields.

An effective way to detect for empty rows is to load the data set into a Pandas dataframe and then use

Isnull() function to check for null values in the data frame.

Open Spyder IDE and copy and paste the following code snippet, name it cleanData1.py.

Code run:

import pandas as pd

df =pd.read_csv('preDataset1.csv')

df.isnull().sum

print (df)

print()

print(" Data Frame after replacing NaN with the Mean of the column ")

df.B = df.B.fillna(df.B.mean())

print(df)

image text in transcribed

In the above code snippet, we note that when Pandas loads a data set, it uses NaN to represent empty field. One of the ways to handle this issue is to replace all NaNs in that specific column with the average value of that column as shown above.

Normalizing Columns

The objective of normalization is to change the values of the numeric columns in the dataset to use a common scale without modifying the differences in the range of values. There is need to normalize the data to avoid huge disparity in the scale of numbers, which may cause problems when using data set to train the model.

  1. Open Microsoft Excel and create a file named preNormDataset1.csv with the following content in CEIS310 folder.
A B C
1000 2 3
400 5 6
700 6 9
100 11 12
1300 14 15
1600 17 18

One of the effective ways to solve the Normalization issue is to load the data set into a Pandas dataframe and then use MinMaxScaler class to scale each column to a particular range of values.

  1. Open Spyder IDE and copy and paste the following code snippet, name it cleanData2.py.

Code run:

import pandas as pd

from sklearn import preprocessing

df =pd.read_csv('preNormDataset1.csv')

print (df)

print()

print(" Data Frame after Normalization ")

x = df.values.astype(float)

min_max_scaler = preprocessing.MinMaxScaler()

x_scaled = min_max_scaler.fit_transform(x)

df = pd.DataFrame(x_scaled, columns=df.columns)

print(df)

image text in transcribed

Binarization

Binarization is the technique with the help of which we can make our data binary. We can use a binary threshold for making our data binary.

Open Spyder IDE and copy and paste the following code snippet, name it AIDataPrep1.py.

Code run: AIDataPrep1.py

# student_name

import numpy as np

from sklearn import preprocessing

input_data = np.array([[2.1, -1.9, 5.5],

[-1.5, 2.4, 3.5],

[0.5, -7.9, 5.6],

[5.9, 2.3, -5.8]])

print(input_data)

data_binarized = preprocessing.Binarizer(threshold=0.5).transform(input_data)

print(" Binarized data after preprocessing: ", data_binarized)

print()

print("Mean =", input_data.mean(axis=0))

print("Std Deviation =", input_data.std(axis=0))

image text in transcribed

In the above code snippet, the preprocessing, Binarizer() function binarizes data according to an imposed threshold. Values greater than the threshold map to 1, and values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1. In our case, the threshold imposed is 0.5, so values greater than 0.5 are mapped to 1, and values less than 0.5 are mapped to 0.

-----------------------------------------------------------------------

Activity A:

Create a file in Excel and name it IrisSubset1.xlsx as depicted below.

image text in transcribed

  1. Save it as IrisSubset1.csv.
  2. Open this saved Excel file (IrisSubset.xlsx) and now save it as IridSubset.csv.
  3. Open Spyder or any other python IDE.
  4. Modify the code snippet cleanData1.py
  5. Save your modified Python code as cleanData3.py.

Enclose your Python code and screenshot of resulting output.

I do not have access to Python or excel/office

Activity B:

  1. Open your favorite Python IDE.
  2. Modify the AIdataPrep1.py file given to you.
  3. Modify the Python code changing the Binarization threshold from 0.5 to 1.5.
  4. Save your modified code as AIdataPrep2.py.
  5. Enclose your Code output and explain your findings.
  • Include a screenshot of your output and Python code For Activity A with date in the comments.
  • Include a screenshot of your output and Python code For Activity B with date in the comments.

I dont have Excel or Office due to not having availability to school computers

can screenshots of the output and code be provided Thank you

ile Edit Search Source Run Debug Consoles Projects Tools View Help C:YUsersimmuqri it1.py > pChart1.py chart1.py Databaseload1.py PizcopriceSize3.py PizzapticeSize3A.py basicstats3.py untilled2.py* import pandas as pd import numpy as np data ={ "calories": [504, 380, 396], "duration": [60, 40, 45] \} \#load data into a DataFrame object: df = pd.DataFrame ( data ) print(df) Console 1/A DeVry University/Documents/ CEIS310MLDLAINew folder ') \begin{tabular}{rrrr} & A & B & C \\ 0 & 1 & 2.0 & 3 \\ 1 & 4 & NaN & 6 \\ 2 & 7 & 8.0 & 9 \\ 3 & 10 & NaN & 12 \\ 4 & 13 & 14.0 & 15 \\ 5 & 16 & 17.0 & 18 \end{tabular} Data Frame after replacing NaN with the Mean of the column \begin{tabular}{rrrr} & A & B & C \\ 0 & 1 & 2.00 & 3 \\ 1 & 4 & 10.25 & 6 \\ 2 & 7 & 8.00 & 9 \\ 3 & 10 & 10.25 & 12 \\ 4 & 13 & 14.00 & 15 \\ 5 & 16 & 17.00 & 18 \end{tabular} Source Conscle Object Variable explorer Help Prots files Console 1/A OneDrive - DeVry University/Documents/ CEIS310MLDLAINew folder/cleanData2.py', wdir='C:/Users/D99005508/OneDrive - DeVry University/Documents/ CEIS310MLDLAINew folder') \begin{tabular}{rrrr} & A & B & C \\ 0 & 1000 & 2 & 3 \\ 1 & 400 & 5 & 6 \\ 2 & 700 & 6 & 9 \\ 3 & 100 & 11 & 12 \\ 4 & 1300 & 14 & 15 \\ 5 & 1600 & 17 & 18 \end{tabular} Data Frame after Normalization \begin{tabular}{lrrr} & A & B & C \\ 0 & 0.6 & 0.000000 & 0.0 \\ 1 & 0.2 & 0.200000 & 0.2 \\ 2 & 0.4 & 0.266667 & 0.4 \\ 3 & 0.0 & 0.600000 & 0.6 \\ 4 & 0.8 & 0.800000 & 0.8 \\ 5 & 1.0 & 1.000000 & 1.0 \end{tabular} Search Source Run Debug Consoles Projects Tools View Help 1D990055081OneDrive - DeVry University|DocumentsiCEIS310MLDLAINew folder|AIDataPrep1.py 5110P7.py CEIS110P8.py CEIS110P9.py filetest1.py test3.py fletestmovie1.py filetestmovie2.py AIDataprep1.py > import numpy as np from sklearn import preprocessing input_data = np.array ([[2.1,1.9,5.5], [1.5,2.4,3.5], [0.5,7.9,5.6], [5.9,2.3,5.81]) print(input_data) data_binarized = preprocessing.Binarizer ( threshold=0.5).transform(input_data) print("InBinarized data after preprocessing: In", data_binarized) print() print("Mean =", input_data.mean(axis=0)) print("Std Deviation =", input_data.std(axis=0)) folder/AIDataPrep1.py', wdir='C:/Users/ D99005508/OneDrive - DeVry University/ Documents/CEIS310MLDLAINew folder ') [[ 2.11.95.5] [1.52.43.5] [0.57.95.6] [5.92.35.8] Binarized data after preprocessing: [ [1.0.1.] [0. 1. 1.] [0. 0.1.] [1. 1. 0.]] Mean =[1.751.2752.2] Std Deviation =[2.714313914.20022321 4.69414529] \begin{tabular}{|c|c|c|c|c|c|} \hline F2 & & xr & fx & & \\ \hline & A & B & C & D & E \\ \hline 1 & sepal_length & sepal_width & petal_length & & \\ \hline 2 & 5.1 & 3.5 & 1.4 & & \\ \hline 3 & 4.6 & NaN & 1.5 & & \\ \hline 4 & 4.7 & 3.2 & 1.3 & & \\ \hline 5 & 4.9 & 3.9 & 1.3 & & \\ \hline 6 & 5 & NaN & 1.4 & & \\ \hline 7 & 5.1 & 3.5 & 1.4 & & \\ \hline 8 & & & & & \\ \hline \end{tabular}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

=+c) Which model fits better?

Answered: 1 week ago