Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Make a new HDFS directory called /loudacre/weblogs. 2. Move all of the files in the Linux directory /home/training/training_materials/data/weblogs/ to the HDFS directory created in

1. Make a new HDFS directory called /loudacre/weblogs.

2. Move all of the files in the Linux directory /home/training/training_materials/data/weblogs/

to the HDFS directory created in step 1.

3. Create a val logfiles that refers to the weblogs HDFS directory and use this in subsequent commands to save some typing.

4. Create a val input that is an RDD containing all of the records in the logfiles directory. 5. Create a val inputJPG that contains only the records from input that contain jpg requests

6. View the first 10 records using take.

7. Combine the functions of steps 4 and 5 in one line of code, adding a count of the records

too. What is that count?

8. Use the map function to get the length of each inputJPG record, and show the first 10

results.

9. Use the map function, splitting on space, to show the individual fields in each inputJPG

record, and show the first 10 results.

10. Starting with inputJPG, show just the IP address in each record and show the first 10

results.

11. Use the .foreach(println) technique to make the results in step 10 easier to read.

12. Save the results from step 10 in the HDFS file /loudacre/iplist.

13. Use the -ls and -cat command options (and optionally the Hue browser) to show the HDFS

directory /loudacre/iplist as well as the contents of the part file containing the IP addresses. 14. Write a program to display just the IP addresses and timestamps for all the records that contain jpg requests in the format IPAddress/Timestamp and show the first 10 results.

15. Turn in the lines of code you used for each step, and the output for each step (or a subset

of the output if it is too big).

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential Data Protection For Estate Agencies In Singapore 2024

Authors: Yang Yen Thaw Yt

1st Edition

B0CQK79WD3, 979-8872095392

More Books

Students also viewed these Databases questions