Question
I do not have access to Python or excel/office Covid sick cant access school computer last time getting error please help No matter how much
I do not have access to Python or excel/office
Covid sick cant access school computer last time getting error please help
No matter how much I access and cleanData1 I always get an error on every code
--------------------------------------------------------------------------------------------
Creating a DataFrame With Pandas
We can create a Pandas DataFrame using the DataFrame() class.
Code run:
import pandas as pd
import numpy as np
data = {
"calories": [504, 380, 396],
"duration": [60, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Cleaning Rows With NaNs
- On your computer under Documents folder, create a new folder called CEIS310.
- Open Microsoft Excel and create a file named preDataset1.xlsx with the following content in CEIS310 folder.
A | B | C |
1 | 2 | 3 |
4 | NaN | 6 |
7 | 8 | 9 |
10 | NaN | 12 |
13 | 14 | 15 |
16 | 17 | 18 |
- Open this saved Excel file (preDataset1.xlsx) and now save it as preDataset1.csv.
Visually we can spot that there are a few rows with NaN (empty) fields.
An effective way to detect for empty rows is to load the data set into a Pandas dataframe and then use
Isnull() function to check for null values in the data frame.
Open Spyder IDE and copy and paste the following code snippet, name it cleanData1.py.
Code run:
import pandas as pd
df =pd.read_csv('preDataset1.csv')
df.isnull().sum
print (df)
print()
print(" Data Frame after replacing NaN with the Mean of the column ")
df.B = df.B.fillna(df.B.mean())
print(df)
In the above code snippet, we note that when Pandas loads a data set, it uses NaN to represent empty field. One of the ways to handle this issue is to replace all NaNs in that specific column with the average value of that column as shown above.
Normalizing Columns
The objective of normalization is to change the values of the numeric columns in the dataset to use a common scale without modifying the differences in the range of values. There is need to normalize the data to avoid huge disparity in the scale of numbers, which may cause problems when using data set to train the model.
- Open Microsoft Excel and create a file named preNormDataset1.csv with the following content in CEIS310 folder.
| |||||||||||||||||||||||
One of the effective ways to solve the Normalization issue is to load the data set into a Pandas dataframe and then use MinMaxScaler class to scale each column to a particular range of values.
- Open Spyder IDE and copy and paste the following code snippet, name it cleanData2.py.
Code run:
import pandas as pd
from sklearn import preprocessing
df =pd.read_csv('preNormDataset1.csv')
print (df)
print()
print(" Data Frame after Normalization ")
x = df.values.astype(float)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled, columns=df.columns)
print(df)
Binarization
Binarization is the technique with the help of which we can make our data binary. We can use a binary threshold for making our data binary.
Open Spyder IDE and copy and paste the following code snippet, name it AIDataPrep1.py.
Code run: AIDataPrep1.py
# student_name
import numpy as np
from sklearn import preprocessing
input_data = np.array([[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]])
print(input_data)
data_binarized = preprocessing.Binarizer(threshold=0.5).transform(input_data)
print(" Binarized data after preprocessing: ", data_binarized)
print()
print("Mean =", input_data.mean(axis=0))
print("Std Deviation =", input_data.std(axis=0))
In the above code snippet, the preprocessing, Binarizer()
function binarizes data according to an imposed threshold
. Values greater than the threshold
map to 1, and values less than or equal to the threshold
map to 0. With the default threshold
of 0, only positive values map to 1. In our case, the threshold
imposed is 0.5, so values greater than 0.5 are mapped to 1, and values less than 0.5 are mapped to 0.
-----------------------------------------------------------------------
Activity A:
Create a file in Excel and name it IrisSubset1.xlsx as depicted below.
- Save it as IrisSubset1.csv.
- Open this saved Excel file (IrisSubset.xlsx) and now save it as IridSubset.csv.
- Open Spyder or any other python IDE.
- Modify the code snippet cleanData1.py
- Save your modified Python code as cleanData3.py.
Enclose your Python code and screenshot of resulting output.
I do not have access to Python or excel/office
Activity B:
- Open your favorite Python IDE.
- Modify the AIdataPrep1.py file given to you.
- Modify the Python code changing the Binarization threshold from 0.5 to 1.5.
- Save your modified code as AIdataPrep2.py.
- Enclose your Code output and explain your findings.
- Include a screenshot of your output and Python code For Activity A with date in the comments.
- Include a screenshot of your output and Python code For Activity B with date in the comments.
I dont have Excel or Office due to not having availability to school computers
can screenshots of the output and code be provided Thank you
ile Edit Search Source Run Debug Consoles Projects Tools View Help C:YUsersimmuqri it1.py > pChart1.py chart1.py Databaseload1.py PizcopriceSize3.py PizzapticeSize3A.py basicstats3.py untilled2.py* import pandas as pd import numpy as np data ={ "calories": [504, 380, 396], "duration": [60, 40, 45] \} \#load data into a DataFrame object: df = pd.DataFrame ( data ) print(df) Console 1/A DeVry University/Documents/ CEIS310MLDLAINew folder ') \begin{tabular}{rrrr} & A & B & C \\ 0 & 1 & 2.0 & 3 \\ 1 & 4 & NaN & 6 \\ 2 & 7 & 8.0 & 9 \\ 3 & 10 & NaN & 12 \\ 4 & 13 & 14.0 & 15 \\ 5 & 16 & 17.0 & 18 \end{tabular} Data Frame after replacing NaN with the Mean of the column \begin{tabular}{rrrr} & A & B & C \\ 0 & 1 & 2.00 & 3 \\ 1 & 4 & 10.25 & 6 \\ 2 & 7 & 8.00 & 9 \\ 3 & 10 & 10.25 & 12 \\ 4 & 13 & 14.00 & 15 \\ 5 & 16 & 17.00 & 18 \end{tabular} Source Conscle Object Variable explorer Help Prots files Console 1/A OneDrive - DeVry University/Documents/ CEIS310MLDLAINew folder/cleanData2.py', wdir='C:/Users/D99005508/OneDrive - DeVry University/Documents/ CEIS310MLDLAINew folder') \begin{tabular}{rrrr} & A & B & C \\ 0 & 1000 & 2 & 3 \\ 1 & 400 & 5 & 6 \\ 2 & 700 & 6 & 9 \\ 3 & 100 & 11 & 12 \\ 4 & 1300 & 14 & 15 \\ 5 & 1600 & 17 & 18 \end{tabular} Data Frame after Normalization \begin{tabular}{lrrr} & A & B & C \\ 0 & 0.6 & 0.000000 & 0.0 \\ 1 & 0.2 & 0.200000 & 0.2 \\ 2 & 0.4 & 0.266667 & 0.4 \\ 3 & 0.0 & 0.600000 & 0.6 \\ 4 & 0.8 & 0.800000 & 0.8 \\ 5 & 1.0 & 1.000000 & 1.0 \end{tabular} Search Source Run Debug Consoles Projects Tools View Help 1D990055081OneDrive - DeVry University|DocumentsiCEIS310MLDLAINew folder|AIDataPrep1.py 5110P7.py CEIS110P8.py CEIS110P9.py filetest1.py test3.py fletestmovie1.py filetestmovie2.py AIDataprep1.py > import numpy as np from sklearn import preprocessing input_data = np.array ([[2.1,1.9,5.5], [1.5,2.4,3.5], [0.5,7.9,5.6], [5.9,2.3,5.81]) print(input_data) data_binarized = preprocessing.Binarizer ( threshold=0.5).transform(input_data) print("InBinarized data after preprocessing: In", data_binarized) print() print("Mean =", input_data.mean(axis=0)) print("Std Deviation =", input_data.std(axis=0)) folder/AIDataPrep1.py', wdir='C:/Users/ D99005508/OneDrive - DeVry University/ Documents/CEIS310MLDLAINew folder ') [[ 2.11.95.5] [1.52.43.5] [0.57.95.6] [5.92.35.8] Binarized data after preprocessing: [ [1.0.1.] [0. 1. 1.] [0. 0.1.] [1. 1. 0.]] Mean =[1.751.2752.2] Std Deviation =[2.714313914.20022321 4.69414529] \begin{tabular}{|c|c|c|c|c|c|} \hline F2 & & xr & fx & & \\ \hline & A & B & C & D & E \\ \hline 1 & sepal_length & sepal_width & petal_length & & \\ \hline 2 & 5.1 & 3.5 & 1.4 & & \\ \hline 3 & 4.6 & NaN & 1.5 & & \\ \hline 4 & 4.7 & 3.2 & 1.3 & & \\ \hline 5 & 4.9 & 3.9 & 1.3 & & \\ \hline 6 & 5 & NaN & 1.4 & & \\ \hline 7 & 5.1 & 3.5 & 1.4 & & \\ \hline 8 & & & & & \\ \hline \end{tabular}Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started