Question
Introduction To understand how the parallel computing works for data mining, we are going to imitate the work of computers in small groups to calculate
Introduction To understand how the parallel computing works for data mining, we are going to imitate the work of computers in small groups to calculate simple statistical characteristics (mean and standard deviation) by acting as a node of a distributed computer cluster (1 student = 1 node). Directions
Download the dataset.csv Download dataset.csvPreview the document file. It has recorded data values. The goal is to calculate the average and a standard deviation of that variable as a group. In your initial post describe the algorithm that a central node and computing nodes will need to do to compute the average and the standard deviation of the dataset, given that computing nodes can only work with the assigned fraction in the dataset. Explain what parallelization technique you will use, and why. The first student who submits the initial post will be serving as a central node, which should split the dataset and assign each portion to each student in the group (no data should be assigned to himself). Then each of you should conduct calculation of the fraction of the dataset and post the needed aggregated information in the discussion board. When all partial results are in, the student playing the central node should aggregate them and post the dataset results. To calculate standard deviation you may need to conduct two iterations.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started