Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please implement a single mapreduce on PySpark to compute a matrix multiplication, i.e., Cnxn = Anxn X Bnxn. Here I provide you the pseudocode and

image text in transcribedPlease implement a single mapreduce on PySpark to compute a matrix multiplication, i.e., Cnxn = Anxn X Bnxn. Here I provide you the pseudocode and you can implement it on PySpark. The Map Function: For each element aij of matrix A, produce all the key-value pairs ((i, k), (A, j, aij)) for k = 0, 1, 2, , n-1. Similarly, for each element bjk of B, produce all the key-value pairs ((i, k), (B, j, bjk)) for i = 0, 1, 2, , n-1. A and B are really bits to tell which of the two matrices a value comes from, e.g., you can make A = 0 and B = 1 to indicate the matrix on the right or not. The Reduce Function: For each key (i, k), sum up all aij, * bjk from (A, j, aij,) and (B, j, bjk) for j = 0, 1, 2, , n-1, and output key-value pair ((i, k), cik). Each key (i, k) will have an associated list with all the values (A, j, aij) and (B, j, bjk), for all possible values of j. The Reduce function needs to connect the two values on the list that have the same value of j, for each j. An easy way to do this step is to sort by j the values that begin with A and sort by j the values that begin with B, in separate lists. The jth values on each list must have their third components, aij and bjk extracted and multiplied. Then, these products are summed and the result is paired with ((i, k), cik) in the output of the Reduce function. To simply your understanding, consider the smallest 2x2 matrix multiplication problem like [[a00, a01], [a10, a11]] * [[b00, b01], [b10, b11]] = [[a00b00+a01b10, a00b01+a01b11], [a10b00+a11b10, a10b01+a11b11]] = [[c00, c01], [c10, c11]]. So, a00 should send to c00 and c01, and same for others. Similarly, b00 should send to c00 and c10, and same as others. For example, if A = [[1, 2], [7, 8]] and B = [[3, 4], [5, 6]]. The mapper will generate key-value pairs as ((0, 0), (0, 0, 1)), ((0, 1), (0, 0, 1)), ((0, 0), (0, 1, 2)), ((0, 1), (0, 1, 2)), ((1, 0), (0, 0, 7)), ((1, 1), (0, 0, 7)), ((1, 0), (0, 1, 8)), ((1, 1), (0, 1, 8)), ((0, 0), (1, 0, 3)), ((1, 0), (1, 0, 3)), ((0, 1), (1, 0, 4)), ((1, 1), (1, 0, 4)), ((0, 0), (1, 1, 5)), ((1, 0), (1, 1, 5)), ((0, 1), (1, 1, 6)) , ((1, 1), (1, 1, 6)). The system sorts the keys and assigns the same key to one reducer, so the reducers are: reducer for key (0, 0) gets [(0, 0, 1), (0, 1, 2), (1, 0, 3), (1, 1, 5)], and emit((0, 0), (1*3 + 2*5)). reducer for key (0, 1) gets [(0, 0, 1), (0, 1, 2), (1, 0, 4), (1, 1, 6)], and emit((0, 1), (1*4 + 2*6)). reducer for key (1, 0) gets [(0, 0, 7), (0, 1, 8), (1, 0, 3), (1, 1, 5)], and emit((1, 0), (7*3 + 8*5)). reducer for key (1, 1) gets [(0, 0, 7), (0, 1, 8), (1, 0, 4), (1, 1, 6)], and emit((1, 1), (7*4 + 8*6)). The input are two files, A.text and B.text, and both have fixed number of real numbers per line, and you need find out the product of multiplying the two input matrices. You need to create those input files and test your program for correctness.

Do Matrix Multiplication Using Spark MapReduce (20 points) Please implement a single mapreduce on PySpark to compute a matrix multiplication, i.e., Cnn=AnnBnn. Here I provide you the pseudocode and you can implement it on PySpark. The Map Function: For each element aij of matrix A, produce all the key-value pairs ((i,k),(A,j,aij)) for k=0,1,2,,n1. Similarly, for each element bjk of B, produce all the key-value pairs ((i,k),(B,j,bjk)) for i=0,1,2,,n1. A and B are really bits to tell which of the two matrices a value comes from, e.g., you can make A=0 and B=1 to indicate the matrix on the right or not. The Reduce Function: For each key (i,k), sum up all aij,bjk from (A,j,aij) and (B,j,bjk) for j=0,1,2,,n1, and output key-value pair ((i,k),cik). Each key (i,k) will have an associated list with all the values (A,j,aij) and (B,j,bjk), for all possible values of j. The Reduce function needs to connect the two values on the list that have the same value of j, for each j. An easy way to do this step is to sort by j the values that begin with A and sort by j the values that begin with B, in separate lists. The jth values on each list must have their third components, aij and bjk extracted and multiplied. Then, these products are summed and the result is paired with ((i,k),cik) in the output of the Reduce function. To simply your understanding, consider the smallest 22 matrix multiplication problem like [[a00,a01],[a10,a11]][[b00, b01],[b10,b11]]=[[a00b00+a01b10,a00b01+a01b11],[a10b00+a11b10,a10b01+a11b11]]=[[c00,c01],[c10,c11]]. So, a00 should send to c00 and c01, and same for others. Similarly, b00 should send to c00 and c10, and same as others. For example, if A=[[1,2],[7,8]] and B=[[3,4],[5,6]]. The mapper will generate key-value pairs as ((0,0),(0,0,1)), ((0, 1),(0,0,1)),((0,0),(0,1,2)),((0,1),(0,1,2)),((1,0),(0,0,7)),((1,1),(0,0,7)),((1,0),(0,1,8)),((1,1),(0,1,8)),((0,0),(1,0, 3)),((1,0),(1,0,3)),((0,1),(1,0,4)),((1,1),(1,0,4)),((0,0),(1,1,5)),((1,0),(1,1,5)),((0,1),(1,1,6)),((1,1),(1,1,6)). The system sorts the keys and assigns the same key to one reducer, so the reducers are: 1. reducer for key (0,0) gets [(0,0,1),(0,1,2),(1,0,3),(1,1,5)], and emit ((0,0),(13+25)). 2. reducer for key (0,1) gets [(0,0,1),(0,1,2),(1,0,4),(1,1,6)], and emit ((0,1),(14+26)). 3. reducer for key (1,0) gets [(0,0,7),(0,1,8),(1,0,3),(1,1,5)], and emit ((1,0),(73+85)). 4. reducer for key (1,1) gets [(0,0,7),(0,1,8),(1,0,4),(1,1,6)], and emit ((1,1),(74+86)). The input are two files, A.text and B.text, and both have fixed number of real numbers per line, and you need find out the product of multiplying the two input matrices. You need to create those input files and test your program for

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essentials of Database Management

Authors: Jeffrey A. Hoffer, Heikki Topi, Ramesh Venkataraman

1st edition

ISBN: 133405680, 9780133547702 , 978-0133405682

More Books

Students also viewed these Databases questions

Question

What are Decision Trees?

Answered: 1 week ago

Question

What is meant by the Term Glass Ceiling?

Answered: 1 week ago