Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

Problem 8 . Suppose we have the following workers information. name age gender occupation Mike 19 Male Computer Scientist Paul 26 Male Computer Scientist Bob

Problem 8. Suppose we have the following workers information.

name	age	gender	occupation
Mike	19	Male	Computer Scientist
Paul	26	Male	Computer Scientist
Bob	25	Male	Computer Scientist
Olivia	30	Female	Accountant
Rob	32	Male	Computer Scientist
Susan	36	Female	Computer Scientist
David	35	Male	Accountant
Emma	44	Female	Accountant
Lisa	32	Female	Accountant

The data is stored in a json file /home/rob/exam2/workers_spark.json. The data file is uploaded into iCollege.

We want to compute the number of workers above age 20 in each gender and each occupation. That is, we want to get the following table from the above one.

gender	occupation	count
Male	Computer Scientist	3
Female	Computer Scientist	1
Male	Accountant	1
Female	Accountant	3

Note that Mike is 19, which is larger than 20. Therefore, he is filtered out and not counted.

From the above result table, we can see that there are more male workers in Computer Science and more female workers in Accounting. This is what we learnt from the original workers information table.

Now please design Python Spark algorithm to implement this function. You are required to use Spark Dataframe APIs. The data file is uploaded into iCollege. You may want to program with the data and debug and make sure that your answers are correct. The last line should show the results in the terminal.

Answer: (Only show the key lines of the source code. Do not need the preparation code for Spark Context and Spark SQL Context)

1:

2:

3:

4:

We can also use Spark SQL to implement the same function. Spark SQL allows you to use the SQL-like sentences like SELECT * FROM to operate on the dataset. In this method, you need to call createOrReplaceTempView() and spark.sql() functions to achieve filter and group by functions. Please provide the source code. The last line should show the results in the terminal.

Answer: (Only show the key lines of the source code. Do not need the preparation code for Spark Context and Spark SQL Context)

1:

2:

3:

4:

5:

6:

If we want to use Pig to achieve the same goal, what are the source code for doing that?

The data is stored in a different json file /home/rob/exam2/workers_pig.json. The format in this file is slightly different than that in the previous workers_spark.json file because Spark and Pig have different parsers for Json files. The json data file is also uploaded into iCollege (in the folder of Exam 2). You may want to program with the data and make sure that your answers have no bugs. The last line of the code should write the results into the folder PigOutput on the disk.

Answer: (Please provide the entire source code for Pig)

1:

2:

3:

4:

5:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Database Systems For Integration Of Media And User Environments 98

Advanced Database Systems For Integration Of Media And User Environments 98

Authors: Yahiko Kambayashi, Akifumi Makinouchi, Shunsuke Uemura, Katsumi Tanaka, Yoshifumi Masunaga

1st Edition

9810234368, 978-9810234362

More Books

Students also viewed these Databases questions

Question

★★★★★

Melba purchases land from Adrian. Melba gives Adrian $225,000 in cash and agrees to pay Adrian an additional $400,000 one year later plus interest at 5%. a. What is Melba's adjusted basis for the...

Answered: 1 week ago

Question

★★★★★

Selected data for the Rubio Company follow: (Click the icon to view the data.) Based on these facts, what are Rubio's current ratio and debt ratio? (Ratios have been rounded to three decimal places.)...

Answered: 1 week ago

Question

★★★★★

1. Describe the goals of informative speaking

Answered: 1 week ago

Question

★★★★★

The Cutting Department of Karachi Carpet Company provides the following data for January 2014. Assume that all materials are added at the beginning of the process. a. Prepare a cost of production...

Answered: 1 week ago

Question

★★★★★

Problem 8 . Suppose we have the following workers information. name age gender occupation Mike 19 Male Computer Scientist Paul 26 Male Computer Scientist Bob 25 Male Computer Scientist Olivia 30...

Answered: 1 week ago

Question

★★★★★

Help!! im in big need of some help on how to work this problem swn.|M.nu-mhmmnhm mdh-mmh-mmwmmdng-n-ndm.unmmmadh-Mhnm. Muh- Wm: WM DInnuub-r'dll "Flinn! 1.90M! 90-\" 9mm "2me omom cm Vail-N0 MM 1.4 W...

Answered: 1 week ago

Question

★★★★★

Identify and describe a company similar the type of business you have selected that you have a working knowledge of. Review the company's operations management functions and identify strengths and...

Answered: 1 week ago

Question

★★★★★

Problem 5 ABS Manufacturing had the following production for the month of July: Units Work in process at July 1 10,000 Started during July 40,000 Completed and transferred to finished goods 33,000...

Answered: 1 week ago

Question

★★★★★

Harper Products signed a contract with Cranmore Manufacturing to design, develop, and produce a specialized plastic molding machine for its factory operations. The machine is not currently sold to...

Answered: 1 week ago

Question

★★★★★

You acquired 5 long futures contracts for oil. The standard contract size is 1,000 barrels. The agreed strike price is P110,000 per barrel to be delivered one month from now. The current price of a...

Answered: 1 week ago

Question

★★★★★

a. Merck's common shares i. Merck is authorized to issue 5,400,000,000 shares of common stock. ii. As of December 31, 2007, Merck had issued 2,983,508,675 shares of common stock. iii. The common...

Answered: 1 week ago

Question

★★★★★

Explain how discrimination reduces domestic output and income. Demonstrate that loss using production possibilities analysis.

Answered: 1 week ago

Question

★★★★★

LAST WORD Compare and contrast the general policy approach of the Massachusetts health care reform of 2006 and the Federal law that created health savings accounts. Are the two laws compatible with...

Answered: 1 week ago

Question

★★★★★

Explain the logic of each of the following statements: a. By constraining the decisions of management, unions reduce efficiency and productivity growth. b. As collective-voice institutions, unions...

Answered: 1 week ago

Previous Question Next Question