Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 02, 2024

Problem 1 (20 points) Python code (normal Python and not pyspark) to answer following question. Social computing research at the university of Minnesota has released

Problem 1 (20 points) Python code (normal Python and not pyspark) to answer following question. "Social computing research at the university of Minnesota" has released moving rating data sets at different sizes at "gouplens.org" web site. Load MovieLens 10M dataset, which consists of 10million movie ratings. You can down load the data by going to grouplens.org, and under the "datasets" tab, upload "movieLens 10M dataset" it is 63 MB.

a) Divide the data to 5 almost equal size files and use the five files in the rest of the assignment (2 points)

b) Sort the data from the highest rating movie to the lowest one. Measure how much time sorting takes. (6 points) Don't use sort function, and write the sort function yourself. Use sort function

c). Create histogram of the movie ratings. Measure how much time it takes to create the histogram. (2 points)

d). Data contains more than 10M ratings of 10681 movies by 71567 users. Create histogram of number of times each movie got rated. Measure how much time it takes to create the histogram. (4 points)

e). Choose the lowest three bins of histogram in part C and create a histogram of movie ratings for these three bins. Do the same thing for the top three bins of the histogram. (6 points)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals Design

Database Processing Fundamentals Design

Authors: Marion Donnie Dutton Don F. Seaman

14th Edition Globel Edition

1292107634, 978-1292107639

More Books

Students also viewed these Databases questions

Question

★★★★★

Prepare a flowchart which demonstrates how accrual accounting and cash flows are linked for a merchandise sale. Analyze the impact on the financial statements of each. Provide the required...

Answered: 1 week ago

Question

★★★★★

In Exercises 21 through 36, solve the given equation for x. 2 = e 0.06x

Answered: 1 week ago

Question

★★★★★

What is the double standard of aging? When does it seem most likely to operate, and when does it not apply? What other aspects of womens livesnot mentioned in this chaptermight be affected by the...

Answered: 1 week ago

Question

★★★★★

Computing materials, labor, and fixed cost variances Lennox Manufacturing Company produces a component part of a top secret military communication device. Standard production and cost data for the...

Answered: 1 week ago

Question

★★★★★

D1 retention: 40%, D7 retention: %10, D30 retention: 6% interstitial impressions per DAU: 4, rewarded impressions per DAU: 2, interstitials eCPM: $30, rewarded eCPM: $50 what is day 7 ARPU? how would...

Answered: 1 week ago

Question

★★★★★

Consider the following income statement data from the Ross Company: 2013 2012 Sales revenue $529,000 $454,000 Cost of goods sold 336,000 279,000 Selling expenses 105,000 99,000 Administrative...

Answered: 1 week ago

Question

★★★★★

In your readings, you looked at the EDUCAUSE "IT Governance Toolkit" which provides a solid foundation for creating effective IT governance. Here you will explore a corresponding IT Governance...

Answered: 1 week ago

Question

★★★★★

Please write a stored procedure named pHW_6_xxxx( student) which will display the student's transcript by the input (semester, year). Your program needs to meet the following requirements and test...

Answered: 1 week ago

Question

★★★★★

A clothing manufacturer purchased some newly designed sewing machines in the hopes that production would be increased. The production records (in units/week) of a random sample of workers are shown...

Answered: 1 week ago

Question

★★★★★

Consider water, originally a saturated liquid at 100 Celsius. The water is heated in an isobaric manner to a saturated vapor state. a. Determine the initial pressure b. Determine the final...

Answered: 1 week ago

Question

★★★★★

A . Suppose worker A averages 1 0 0 picks per hour, worker B averages 6 0 picks per hour, and worker C averages 4 0 picks per hour. If the average order requires 1 0 0 picks would be the average rate...

Answered: 1 week ago

Question

★★★★★

6-17 Identify the five problems of a traditional file environment and explain how a database management system solves them.

Answered: 1 week ago

Question

★★★★★

6-15 Identify some of the business intelligence features included in SAPs business software suite. The Lego Group, which is headquartered in Billund, Denmark, is one of the largest toy manufacturers...

Answered: 1 week ago

Question

★★★★★

6-18 Discuss how the following facilitate the management of big data: Hadoop, in-memory computing, analytic platforms.

Answered: 1 week ago

Previous Question Next Question