Question

1 Approved Answer

Posted on Sep 24, 2024

Dataset Generation: First we are going to generate the data which can be used in our experimentation. We are going to assume that the data

image text in transcribed

Dataset Generation: First we are going to generate the data which can be used in our experimentation. We are going to assume that the data is actually samples taken from three different Gaussian distributions. Please follow the following steps, i) Take these three mean values (3, 70), (7, 150) and (13,250). Take these values to be the mean of three different Gaussian distributions, generate 100 random data samples for each mean. Generate the data using standard deviation to be 3 in each dimension, for each distribution. (hint: numpy.random.normal) Page 2 of 4 If we stack all the samples together, this should result in a 2x300 matrix, here each feature vector has dimension 2 and total number of feature samples are 300 (Remember: When you stack all the feature vectors together in a matrix, you already know the order in which you stacked them. In this way, you will always know which feature vector came from which distribution) Now generate 300 samples of a Gaussian distribution with mean (0,0), where standard deviation in each dimension is 1. This should also give you a 2x300 samples of Gaussian noise, add this result to the feature vector matrix generated in step (i). After addition, this result becomes our data, which we are going to utilize for clustering. Dataset Generation: First we are going to generate the data which can be used in our experimentation. We are going to assume that the data is actually samples taken from three different Gaussian distributions. Please follow the following steps, i) Take these three mean values (3, 70), (7, 150) and (13,250). Take these values to be the mean of three different Gaussian distributions, generate 100 random data samples for each mean. Generate the data using standard deviation to be 3 in each dimension, for each distribution. (hint: numpy.random.normal) Page 2 of 4 If we stack all the samples together, this should result in a 2x300 matrix, here each feature vector has dimension 2 and total number of feature samples are 300 (Remember: When you stack all the feature vectors together in a matrix, you already know the order in which you stacked them. In this way, you will always know which feature vector came from which distribution) Now generate 300 samples of a Gaussian distribution with mean (0,0), where standard deviation in each dimension is 1. This should also give you a 2x300 samples of Gaussian noise, add this result to the feature vector matrix generated in step (i). After addition, this result becomes our data, which we are going to utilize for clustering