Can anyone help me with this code? I'm trying to build a persnality detection model that works in parallel but I'm keep getting thus error
[89] " Logistic Regression has n jobs start_timel = tine,tine( ) logreg = Logistichegression (njobs=1) logreg, f it (X train, y_train ) LR_time- time,time( ) [90] LR_pred = logreg.predict (Xtest ) print (classification_report (ytest; LR _pred )) report - classification_report (ytest, LR_pred, output_dict-True) Wy [91] totallime - LR_time-start_timel print("tinet 8.2t seconds" 8 (totalTime)) MinuteTinez (totalTine//60) print("tine: K.2f ininutes" x (Minutetine)) time: 23.33 seconds time: 0,00 minutes Parallelize Model Predictions - Singular Model [94] def predict(data, feature_cols, clf, pred_col)= This function will generate predictions given a dataset, the associated features and a model. def predict(data, feature_cols, clf, pred_col): This function wil1 generate predictions given a dataset, the associated features and a model. params: data (DataFrame) : The dataset which holds the features feature_cols (List String) : List of column names in data corresponding to the model features clf (Model) : The classification model which generates the predictions pred_col (String) : The name of the column you want to store the predictions under in data return: This function will add a column to the input dataset associated to the predictions generated exarple: >> predict( data =df, feature_col = 1ang.features, pred_col = "lang_prediction" [95] B8stime \# normal predictions res = predict( data = new_df, feature_cols = "posts" clf = logreg, pred_col = 'rf_P_prediction' ) mprint (res) CPU times: user 4.46s, sys: 29.5ms, total: 4+48s Hall time: 4.57s [96] new df redict in Parallel from multiprocessing Liport: Pool from multiprocessing inport cpu_count from functools Import partial def parallel_pred(fn, data, feature_cols, clf, pred_col, n_cores): This function will parallelize the prediction process such that the data 15 split into in conponents (a is defined based on n_cores) and passed onto the nodel. parans 1 fn (function) I The function you want to parallelize data (Datafrane) I The dataset holding the features for the nodel feature_cols (List string) I List of colunn names in data corresponding to the nodel features clf. (Hodel) : The nodel which generates the predictions pred_col (String) : The nate of the colun you want to store the predictions under in data n_cores (Integer) : The nuaber of cores you want to use. noturns: This function will return the result of the input function example: parallel pred\} fn = predict, data =d, feature_cols = lang_features, clf a lang_adl, pred_col - "parallel_lang_pred", n_cores - 4 ) If cpu_count()
", line 23, In predict res = clf, predict (ft) File "/usr/1ocal/1ib/python 3.8/dist-packages/sklearn/tinear_model/_base-py", 1 ine 425 , in predict scores = self.decision function (x) X - self._validate_data( X, accept_sparse-"csr", reset-false) File "/usr/local/1ib/python3.8/dist-packages/sklearn/base.py", line 585, in _validate_data self__check_n_features (X, reset-reset) File "/usr/local/1ibjpython3,8/dist-packagesfsklearn/base-py", line 40e, in _check_n_features. raise valueErrort ValueError: X has 66427 features, but Logistickegression is expecting 98555 features as input. The above exception was the direct cause of the following exceptiont ValueError Tracphack (most recent call last) (tined execs in cmodules sipython-1nput-97-a6363417e7292 In parallel_pred(fn, data, feature_cols, clf, pred_col, n_cores) \( \begin{array}{rr}33 \\ 34 & \\ 35 & \text { res = pool.map(partial( } \\ 36 & \text { fn, } \\ 37 & \text { feature_cols = feature_cols, }\end{array} \) ValueError: X has 66427 features, but Logistichegression is expecting 98555 features as input