Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

hello, I need help , when i try to call my pipline i get an error: from pyspark.ml . feature import VectorAssembler, OneHotEncoder, StringIndexer from

hello, I need help , when i try to call my pipline i get an error:
from pyspark.ml.feature import VectorAssembler, OneHotEncoder, StringIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.classification import LogisticRegression, DecisionTreeClassifier
from pyspark.ml import Pipeline
# create lists of feature names
num_features =["age", "hypertension","heart_disease", "avg_glucose_level", "bmi"]
cat_features =[col for col in stroke_df.columns if col not in num_features +["stroke"]]
# create lists of integer-encoded and one-hot encoded features
ix_features =[col +"_ix" for col in cat_features]
vec_features =[col +"_vec" for col in cat_features]
# create StringIndexer, OneHotEncoder, and VectorAssembler objects
indexer = StringIndexer(inputCols=cat_features, outputCols=ix_features, handleInvalid="skip")
encoder = OneHotEncoder(inputCols=ix_features, outputCols=vec_features)
assembler = VectorAssembler(inputCols=num_features + vec_features, outputCol="features")
# create pipeline
pipeline = Pipeline(stages=[indexer, encoder, assembler])
# fit pipeline to data
train = pipeline.fit(stroke_df).transform(stroke_df)
# persist train DataFrame
train.persist()
# display first 10 rows of features and stroke columns of train
train.select("features", "stroke").show(10, truncate=False)
train = train.withColumn("label", train["stroke"]) Applying the Model to New Data : data ={
"gender": ["Female", "Female", "Male", "Male"],
"age": [42.0,64.0,37.0,72.0],
"hypertension": [1,1,0,0],
"heart_disease": [0,1,0,1],
"ever_married": ["No", "Yes", "Yes", "No"],
"work_type": ["Private", "Self-employed", "Private", "Govt_job"],
"Residence_type": ["Urban", "Rural", "Rural", "Urban"],
"avg_glucose_level": [182.1,175.5,79.2,125.7],
"bmi": [26.5,32.5,15.4,19.4],
"smoking_status": ["smokes", "formerly smoked", "unknown", "never smoked"]
}
new_data = pd.DataFrame(data)
new_data
processed_new_data = pipeline.fit(new_data).transform(new_data) : AttributeError: 'DataFrame' object has no attribute '_jdf'
~/.ipykernel/16918/command-472670917524769-290429957 in ?()
---->1 processed_new_data = pipeline.fit(new_data).transform(new_data)
/databricks/python/lib/python3.10/site-packages/pandas/core/generic.py in ?(self, name)
5898 and name not in self._accessors
5899 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5900):
5901 return self[name]
->5902 return object.__getattribute__(self, name)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions