Question
Please help, Perform the following activities in Python. The data can be found here: https://drive.google.com/drive/folders/1c70af2WNeS_TKEUdbrMvKScStIeVd8M4?usp=sharing 1.Load the data in each of the text files into
Please help, Perform the following activities in Python.
The data can be found here: https://drive.google.com/drive/folders/1c70af2WNeS_TKEUdbrMvKScStIeVd8M4?usp=sharing
1.Load the data in each of the text files into a data frame, resulting in 3 data frames. Note that only 16 channel EEG data are extracted from the text file.
2.Delete the first 125 rows (1 second recording) from each of the data frames.
3.Compute the alpha PSIs of 16 channels, using 125 as the segment size, as depicted bellow, and save the alpha PSIs in a data frame of 16 columns. Repeat this for each of the 3 data frames.
4.Add the dependent variable "State" to each data frame, as depicted bellow.
5.Combine 3 data frames into one by vertical stacking.
6.Build a correlation coefficient matrix of the alpha PSIs of the 16 channels.
7.Remove co-linearity from the data frame: For channels that have a correlation coefficient that is higher than 0.8 or less than -0.8, only one of them is kept in the data frame, and all others are removed. Raise the correlation threshold to a higher value if the number of left-over variables is less than 4.
8.Normalize data in every remaining channel using the min-max normalization.
- Build a KerasClassifier with two hidden layers (See the sample codes following the instructions).
- Build a visualization of the trained model using the plot_model() function. Sample codes are provided after the instructions.
11.Build a 5-fold cross validation and print out performance data (See the sample codes after the instructions).
Show the Python source codes (.py file) and a Word document containing the correlation matrix of the combined dataset, the visualization of the Keras model, and the output of the cross validation (confusion matrix and accuracy of each fold, and the mean and standard deviation of all the folds)
Show the Python source codes (.py file) and a Word document containing the correlation matrix of the combined dataset, the visualization of the Keras model, and the output of the cross validation (confusion matrix and accuracy of each fold, and the mean and standard deviation of all the folds)
ANY ADDITIONAL INFORMATION YOU MIGHT NEED IS FOUND BELOW:
Each EEG dataset is a text file with the following information at the beginning:
%OpenBCI Raw EEG Data
%Number of channels = 16
%Sample Rate = 125 Hz
%Board = OpenBCI_GUI$BoardCytonSerialDaisy
Sample Index, EXG Channel 0, EXG Channel 1, EXG Channel 2, EXG Channel 3, EXG Channel 4, EXG Channel 5, EXG Channel 6, EXG Channel 7, EXG Channel 8, EXG Channel 9, EXG Channel 10, EXG Channel 11, EXG Channel 12, EXG Channel 13, EXG Channel 14, EXG Channel 15, Accel Channel 0, Accel Channel 1, Accel Channel 2, Other, Other, Other, Other, Other, Other, Other, Analog Channel 0, Analog Channel 1, Analog Channel 2, Timestamp, Timestamp (Formatted)
The data can be found here: https://drive.google.com/drive/folders/1c70af2WNeS_TKEUdbrMvKScStIeVd8M4?usp=sharing
Simply copy the link, paste it on the browser and download the files
Note that the list of the column names is wrapped around in multiple lines. EEG readings are in lines that follow the bellow header.
The data we will use for this lab include the 16-channel EEG data in columns "EXG Channel 0-15". All other data are not used in this lab. You are given 3 data files with prefix "Pre-XXX", "Meditation-XXX", and "Post-XXX", respectively. They are EEG data recorded in the pre-meditation learning phase, the meditation phase, and the post-meditation learning phase, respectively. Extract those 16 channels data from each data file and save them in a data frame, hence 3 data frames.
After reading the 16-channel data, remove the data of the first second from each data frame. Since the sampling rate is 125/sec, so the rows to be removed are the first 125 rows.
Normalize every channel using the min-max normalization.
We will use package "pyEEG" to compute power spectrum intensity (PSI) to be used in modeling. Download pyEEG package from https://github.com/forrestbao/pyeeg and follow the instructions on the website to install the package. The function we use in this lab is pin_power(X,Band,Fs), where X is a list of 1-D real time series, Band is a list of boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. In this lab, we only use alpha, so, set Band = [7, 12]. Fs is an integer indicating the sampling rate in physical frequency, which is 125 in our case.
For more information about pyEEG and its usage, visit:
https://www.hindawi.com/journals/cin/2011/406391/
http://pyeeg.sourceforge.net/
We will compute an alpha PSI for every second segment, which means X should be a list of 125 EEG readings in each call to pin_power(). In other words, one alpha PSI value is computed out of every second recording for one particular channel. If the entire EEG recording has 10 minutes, then there will be 10 X 60 = 600 alpha PSI values computed for one channel.
Repeat the above process to compute alpha PSI values for each of the 16 channels. Save the results in a data frame of 16 columns, corresponding to 16 channels. Do the same for each of the 3 data frames of EEG raw data.
For each data frame, build a column of brain state labels. Name the column "State". The values in the "State" column are "Pre" for the data frame corresponding to the pre-meditation phase, "Med" for the data frame corresponding to the meditation phase, and "Post" for the data frame corresponding to the post-meditation phase.
Combine 3 data frames into one by vertically stacking up the 3 data frames (or row binding).
Five-fold cross validation: random split the dataset into 5 portions, train the classifier model using 4 portions and test the model using the remaining portion. Rotate the 5 portions as the testing dataset.Variable "State" is the dependent variable and all other 16 variables are independent variables.
Sample codes:
Build a KerasClassifier:
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(16, input_dim=16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=10, batch_size=256, verbose=0)
Plot the model:
import tensorflow as tf
dot_img_file = 'model.png'
tf.keras.utils.plot_model(model, to_file=dot_img_file, show_shapes=True)
Cross validation:
kfold = KFold(n_splits=5, random_state=42, shuffle=True)
conf_matrix_list_of_arrays = []
for train_index, test_index in kfold.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]
estimator.fit(X_train, y_train)
conf_matrix = confusion_matrix(y_test, estimator.predict(X_test))
print(conf_matrix)
print(' ')
conf_matrix_list_of_arrays.append(conf_matrix)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
for i in range(len(results)):
print("Fold %d Accuracy : %.2f%%"% (i+1,results[i]*100))
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started