Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

a) Solve 9.3.1-a (normalize the ratings based on a threshold), 9.3.1-e f 9 h 3 2 A B C 4 2 b 5 CO

a) Solve 9.3.1-a (normalize the ratings based on a threshold), 9.3.1-e f 9 h 3 2 A B C 4 2 b 5 CO CT 3 4 1 d 2) a) Describe at least one mechanism that ensures that data is not lost in Spark when a hardware/software cat mydata | python storm.py 4 2 This can be tested in your own environment, without Linux, using the

a) Solve 9.3.1-a (normalize the ratings based on a threshold), 9.3.1-e f 9 h 3 2 A B C 4 2 b 5 CO CT 3 4 1 d e 5 1 3 1 3 2 1 4 5 3 Figure 9.8: A utility matrix for exercises Exercise 9.3.1: Figure 9.8 is a utility matrix, representing the ratings, on a 1-5 star scale, of eight items, a through h, by three users A, B, and C. Compute the following from the data of this matrix. (a) Treating the utility matrix as boolean, compute the Jaccard distance be- tween each pair of users. (e) Normalize the matrix by subtracting from each nonblank entry the average value for its user. b) Describe one strategy that is used to make a utility matrix less sparse Acti Go to 2) a) Describe at least one mechanism that ensures that data is not lost in Spark when a hardware/software failure occurs. b) From a resource (i.e., disk, memory, CPU) managing perspective, which Hadoop nodes should be chosen to run Spark tasks? Specifically, which type of tasks can co-exist with Spark and which ones should not be using the same node as Spark? c) Implement (in python only, without actual Storm) a solution that would compute streaming queries average for a specified window. For example, to compute a 4-value windowed average that moves 2 tuples at a time, you can use the following line (make sure that your code supports other sizes as well). You are not allowed to first read the entire input before producing output, because the input stream is infinite. Instead, you should compute and print the output as the data arrives. cat mydata | python storm.py 4 2 This can be tested in your own environment, without Linux, using the following code: fd = open('mydata', 'r') sys.stdin = fd for line in sys.stdin: Assuming that mydata file contains (one value per line, no error checking is necessary) 5 3 6 11 # Your code goes here. 8 4 6 3 7 Your command above should output the following three averages (representing an average of (5,3,6,11), (6,11,8,4), (8,4,6,3)): 6.25 7.25 5.25 The last window only contains 6, 3, 7 and cannot output an average until more data arrives.

Step by Step Solution

3.48 Rating (161 Votes )

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Income Tax Fundamentals 2013

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

31st Edition

1111972516, 978-1285586618, 1285586611, 978-1285613109, 978-1111972516

More Books

Students also viewed these Programming questions

Question

1. Offer surprise rewards for good participation in class.

Answered: 1 week ago