Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 11, 2024

1 Computing Degree Distributions on Networkly (5 points) Alice is a data scientist at Networkly, a large social networking website where users can follow each

image text in transcribed

1 Computing Degree Distributions on Networkly (5 points) Alice is a data scientist at Networkly, a large social networking website where users can follow each other. Alice has a deadline in three hours, and needs your help running an analysis. Suppose you are given the Networkly follow graph. Assumptions you may make: The graph has 330 million nodes and 100 billion directed edges. The most followed user on Networkly has 100 million followers. Your computing cluster has 1,000 computers, each with 64 GB of RAM. Using Spark, write a parallel program to compute the in-degree distribution of the graph. That is, for each k, your program should compute the number of users who have k followers (unless this number is zero, in which case you do not need to include that count in the final output) Your program should effectively take advantage of parallel processing and not attempt to read 100 billion directed edges on a single machine. Your input is an RDD, where each entry is an edge: (u, v) is an entry in the RDD if u follows v. Your function should return another RDD, where the entries are of the form (k, number of users who have k followers). What to submit: Please fill in the code stubs provided for you on the next page, you may choose to fill in the stub for either Python, Java, or Scala. You should write real code (rather than pseudocode). However, because you do not have access to a compiler/interpreter, it will be possible to get full credit even if your code does not compile. Specifically, the grading rubric will be (3 points) A correct pseudocode description of the algorithm, without any code. (4 points) Real code that partially works, and corresponds to a correct high-level algorithm. (5 points) Real code that completely works, minus one or two missing parentheses. 2 Python version import sys from pyspark import SparkConf, SparkContext # # Input: An RDD containing entries of the form (source node, destination node) # Output: An RDD containing entries of the form # (k, number of users who have k followers) # TODO: Write your answer code in this function. def degree_distribution(edges): return distribution if __name__ == '__main__': conf = SparkConf () SC = SparkContext(conf=conf) sc.setLogLevel("WARN") # Reads input and converts all node ids to integers data = sc.textFile(sys.argv[1]).map(lambda line: map(int, line.split())) # Computes the degree distribution distribution - degree_distribution (data) # Writes the output to a file distribution.sortByKey().saveAsTextFile(sys.argv[2]) sc.stop) 3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Larry Ellison Database Genius Of Oracle

Larry Ellison Database Genius Of Oracle

Authors: Craig Peters

1st Edition

0766019748, 978-0766019744

More Books

Students also viewed these Databases questions

Question

★★★★★

The wall thickness of 25 glass 2-liter bottles was measured by a quality-control engineer. The sample mean was x = 4.05 millimeters, and the sample standard deviation was s = 0.08 millimeter. Find a...

Answered: 1 week ago

Question

★★★★★

For men, pulse rates (in beats per minute) are normally distributed with mean 70 and standard deviation 10. Womens pulse rates are normally distributed with mean 77 and standard deviation 12. Let X =...

Answered: 1 week ago

Question

★★★★★

69. The following table gives the number of fatal accidents of U.S. commercial airline carriers in the 16 years from 1980 to 1995. Do these data disprove, at the 5 percent level of significance, the...

Answered: 1 week ago

Question

★★★★★

Coastal Safety manufactures flotation vests in Miami, Florida. Coastal Safety's contribution margin income statement for the most recent month contains the following data: Suppose Dazzle Cruiselines...

Answered: 1 week ago

Question

★★★★★

1 Computing Degree Distributions on Networkly (5 points) Alice is a data scientist at Networkly, a large social networking website where users can follow each other. Alice has a deadline in three...

Answered: 1 week ago

Question

★★★★★

On January 1, 2024 West Coast Boats paid $ 1,850,000 for its 30% investment in East Coast Autos. After the purchase, West Coast Boats has no significant influence over East Coast Autos. East Coast...

Answered: 1 week ago

Question

★★★★★

(20 points) The following data shows the total spend on Research and development by institutions of higher learning in the US, in billions of dollars for the period 2003 - 2011. Find the best fit...

Answered: 1 week ago

Question

★★★★★

Question 1 (13 pts) Read the following and use it to answer the following questions: James owns a factory that produces fuel from bio waste and sells the fuel to the agricultural industry. He holds a...

Answered: 1 week ago

Question

★★★★★

1)Breakeven point: Algebraic and graphicalFine Leather Enterprises sells its single product for $129.00 per unit. The firm's fixed operating costs are $473,000 annually, and its variable operating...

Answered: 1 week ago

Question

★★★★★

13. Jose is 35 years old, and makes $40,000 per year. If he dies, how much would the beneficiaries of his life insurance policy receive if they can get by on 75% of his income?

Answered: 1 week ago

Question

★★★★★

Cookie Business In this project, you will be opening your own specialty cookie company to see how product costing methods and changes in production affect business decisions. You will be creating a...

Answered: 1 week ago

Question

★★★★★

Compare the different types of employee separation actions.

Answered: 1 week ago

Question

★★★★★

Assess alternative dispute resolution methods.

Answered: 1 week ago

Question

★★★★★

Distinguish between intrinsic and extrinsic rewards.

Answered: 1 week ago

Previous Question Next Question