Answered step by step
Verified Expert Solution
Question
1 Approved Answer
hello, I need help , when i try to call my pipline i get an error: from pyspark.ml . feature import VectorAssembler, OneHotEncoder, StringIndexer from
hello, I need help when i try to call my pipline i get an error: from pyspark.mlfeature import VectorAssembler, OneHotEncoder, StringIndexer from pyspark.mlevaluation import MulticlassClassificationEvaluator from pyspark.mlclassification import LogisticRegression, DecisionTreeClassifier from pyspark.ml import Pipeline # create lists of feature names numfeatures age "hypertension","heartdisease", "avgglucoselevel", "bmi" catfeatures col for col in strokedfcolumns if col not in numfeatures stroke # create lists of integerencoded and onehot encoded features ixfeatures col ix for col in catfeatures vecfeatures col vec" for col in catfeatures # create StringIndexer, OneHotEncoder, and VectorAssembler objects indexer StringIndexerinputColscatfeatures, outputColsixfeatures, handleInvalid"skip" encoder OneHotEncoderinputColsixfeatures, outputColsvecfeatures assembler VectorAssemblerinputColsnumfeatures vecfeatures, outputCol"features" # create pipeline pipeline Pipelinestagesindexer encoder, assembler # fit pipeline to data train pipeline.fitstrokedftransformstrokedf # persist train DataFrame train.persist # display first rows of features and stroke columns of train train.selectfeatures "stroke"show truncateFalse train train.withColumnlabel trainstroke Applying the Model to New Data : data "gender": Female "Female", "Male", "Male" "age": "hypertension": "heartdisease": "evermarried": No "Yes", "Yes", No "worktype": Private "Selfemployed", "Private", "Govtjob" "Residencetype": Urban "Rural", "Rural", "Urban" "avgglucoselevel": "bmi": "smokingstatus": smokes "formerly smoked", "unknown", "never smoked" newdata pdDataFramedata newdata processednewdata pipeline.fitnewdatatransformnewdata : AttributeError: 'DataFrame' object has no attribute jdf ~ipykernelcommand in processednewdata pipeline.fitnewdatatransformnewdata databrickspythonlibpythonsitepackagespandascoregenericpy in self name and name not in self.accessors and self.infoaxis.canholdidentifiersandholdsnamename : return selfname return object.getattributeself name
hello, I need help when i try to call my pipline i get an error:
from pyspark.mlfeature import VectorAssembler, OneHotEncoder, StringIndexer
from pyspark.mlevaluation import MulticlassClassificationEvaluator
from pyspark.mlclassification import LogisticRegression, DecisionTreeClassifier
from pyspark.ml import Pipeline
# create lists of feature names
numfeatures age "hypertension","heartdisease", "avgglucoselevel", "bmi"
catfeatures col for col in strokedfcolumns if col not in numfeatures stroke
# create lists of integerencoded and onehot encoded features
ixfeatures col ix for col in catfeatures
vecfeatures col vec" for col in catfeatures
# create StringIndexer, OneHotEncoder, and VectorAssembler objects
indexer StringIndexerinputColscatfeatures, outputColsixfeatures, handleInvalid"skip"
encoder OneHotEncoderinputColsixfeatures, outputColsvecfeatures
assembler VectorAssemblerinputColsnumfeatures vecfeatures, outputCol"features"
# create pipeline
pipeline Pipelinestagesindexer encoder, assembler
# fit pipeline to data
train pipeline.fitstrokedftransformstrokedf
# persist train DataFrame
train.persist
# display first rows of features and stroke columns of train
train.selectfeatures "stroke"show truncateFalse
train train.withColumnlabel trainstroke Applying the Model to New Data : data
"gender": Female "Female", "Male", "Male"
"age":
"hypertension":
"heartdisease":
"evermarried": No "Yes", "Yes", No
"worktype": Private "Selfemployed", "Private", "Govtjob"
"Residencetype": Urban "Rural", "Rural", "Urban"
"avgglucoselevel":
"bmi":
"smokingstatus": smokes "formerly smoked", "unknown", "never smoked"
newdata pdDataFramedata
newdata
processednewdata pipeline.fitnewdatatransformnewdata : AttributeError: 'DataFrame' object has no attribute jdf
~ipykernelcommand in
processednewdata pipeline.fitnewdatatransformnewdata
databrickspythonlibpythonsitepackagespandascoregenericpy in self name
and name not in self.accessors
and self.infoaxis.canholdidentifiersandholdsnamename
:
return selfname
return object.getattributeself name
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started