write the code for Wordcount.java to display the output for Average rating and the number of user
Fantastic news! We've Found the answer you've been seeking!
Question:
write the code for Wordcount.java to display the output for Average rating and the number of user who rated the movie, the u.data set
Transcribed Image Text:
The following file is from Movielens dataset which shows user ratings for movies: http://files.grouplens.org/datasets/movielens/ml-100k/u.data You can find more about this dataset here: https://files.grouplens.org/datasets/movielens/ml-100k-README.txt u.data is the full u data set with 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file 95 546 2 879196566 Is interpreted as follows: User 95 has rated movie 546, 2/5 (rates are in the range 1-5) at time 879196566 (Monday, November 10, 1997 9:16:06 PM, GMT). Your task is to use MapReduce programming and find the following information for each movie: the average rating and the number of users who rated this movie. Here is an example of the output: Movie ID 340 499 Average Rating 3.78 Number of Users Rated 298 4.02 532 You can choose the output format. However, the required information must be included in the output. Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one (rating value in the file exists in the third column). 2 The following file is from Movielens dataset which shows user ratings for movies: http://files.grouplens.org/datasets/movielens/ml-100k/u.data You can find more about this dataset here: https://files.grouplens.org/datasets/movielens/ml-100k-README.txt u.data is the full u data set with 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file 95 546 2 879196566 Is interpreted as follows: User 95 has rated movie 546, 2/5 (rates are in the range 1-5) at time 879196566 (Monday, November 10, 1997 9:16:06 PM, GMT). Your task is to use MapReduce programming and find the following information for each movie: the average rating and the number of users who rated this movie. Here is an example of the output: Movie ID 340 499 Average Rating 3.78 Number of Users Rated 298 4.02 532 You can choose the output format. However, the required information must be included in the output. Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one (rating value in the file exists in the third column). 2
Expert Answer:
Answer rating: 100% (QA)
Sure heres a basic implementation of the WordCount program in Java for finding the average rating and the number of users who rated each movie in the ... View the full answer
Related Book For
An Introduction to Management Science Quantitative Approach to Decision Making
ISBN: 978-1337406529
15th edition
Authors: David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran
Posted Date:
Students also viewed these programming questions
-
ROBO 351 HST Assignment Design and drive selection for a hydrostatic application. Choose any propelled equipment with open or closed loop HST. 1. Include the pay load and/or any machine function...
-
(a) Sets containing integers can be represented as int list values. Consider two such representations called unordered and ordered. In the former elements can appear in any order; in the latter...
-
Hello, I'm having some trouble with this Java project. I cannot figure out why the text files member.txt and register.txt are not being updated and saved when i run the program. Methods...
-
Solve the following general system by inverting the coefficient matrix and using Theorem 1.6.2. x1 + 2x2 + 3x3 = b1 x1 - x2 + x3 = b2 x1 + x2 = b3 (a) b1= - 1, b2 = 3, b3 = 4 (b) b1 = - 1, b2 = -1,...
-
How is the physical environment on mountains at midlatitudes similar to that in tropical alpine zones? How do these environments differ?
-
Refer to Exercise 17.14. a. Predict with 95% confidence the MBA program GPA of a BEng whose undergraduate GPA was 9.0, whose GMAT score as 700, and who has had 10 years of work experience. b. Repeat...
-
demonstrate the influence of professionalization on ethics in healthcare,
-
Phoenix Company can invest in each of three cheese-making projects: C1, C2, and C3. Use the Table for annuity value. Each project requires an initial investment of $228,000 and would yield the...
-
Vintage, Inc. has a total asset turnover of 1.16 and a net profit margin of 5.76 percent. The total assets to equity ratio for the firm is 1.5. Calculate Vintage's return on equity. Round the answers...
-
You are responsible for the audit of inventory for Honey Best Grocery Wholesales, Inc., a closely held grocery wholesaler that sells to independent grocery stores. Inventory is by far the largest...
-
The current price of a company is $55.00, the company is expected to have the following free cash flows below. Cash Flows $4.25 $4.25 $4.50 Time Period 1 2 3 The firm has a WACC of 8%. The...
-
A 42-year-old woman has just suffered her 3rd miscarriage. She has no living children. She and her husband have been trying to have children for the last 5 years. Considering your knowledge of...
-
What policies and procedures might an organization have concerning resource management? What must resource managers do to ensure effective resource allocation? How can you obtain human resources and...
-
Persuasive Speech Outline Format Monroe's Motivated Sequence Speeches to persuade attempt to influence an audience members' attitudes, values, or beliefs about a particular topic and motivate them...
-
How can conflicts of interest distort rational decision - making within organizations? What strategies can companies employ to ensure that decisions remain untainted by personal interests?
-
Wizard, a high tech startup, purchased 500 ultra high-tech electronic "pencils" for each of its employees to use on their tablets. Each pencil costs $400 and expects to have a useful life of 2 years....
-
Vaughn Company began operations in 2024. Since then, it has reported the following gains and losses for its equity investments on the income statement: 2024 2025 2026 Gains (losses) from sale of...
-
Assessing simultaneous changes in CVP relationships Braun Corporation sells hammocks; variable costs are $75 each, and the hammocks are sold for $125 each. Braun incurs $240,000 of fixed operating...
-
South Central Airlines (SCA) operates a commuter flight between Atlanta and Charlotte. The regional jet holds 50 passengers, and currently SCA only books up to 50 reservations. Past data show that...
-
Davison Electronics manufactures two models of LCD televisions, identified as model A and model B. Each model has its lowest possible production cost when produced on Davisons new production line....
-
Assume that the project in Problem 3 has the following activity times (in months): a. Find the critical path. b. The project must be completed in 1.5 years. Do you anticipate difficulty in meeting...
-
Overtime pay for a production department Curry Company employs nine production workers at $7.00 per hour for first shift and a 50 percent overtime premium for any hours worked in excess of 40 per...
-
Understanding inventory cost data Data relevant to Job No. QV1173 are: An analysis of the production data indicate a number of additional items as follows: a. In addition to the 200 good units, there...
-
Identifying errors in accounting for materials Ristlott Company is a small manufactur- ing company using a perpetual inventory system. Recently the company hired an accountant who has not worked for...
Study smarter with the SolutionInn App