Question
1. Make a new HDFS directory called /loudacre/weblogs. 2. Move all of the files in the Linux directory /home/training/training_materials/data/weblogs/ to the HDFS directory created in
1. Make a new HDFS directory called /loudacre/weblogs.
2. Move all of the files in the Linux directory /home/training/training_materials/data/weblogs/
to the HDFS directory created in step 1.
3. Create a val logfiles that refers to the weblogs HDFS directory and use this in subsequent commands to save some typing.
4. Create a val input that is an RDD containing all of the records in the logfiles directory. 5. Create a val inputJPG that contains only the records from input that contain jpg requests
6. View the first 10 records using take.
7. Combine the functions of steps 4 and 5 in one line of code, adding a count of the records
too. What is that count?
8. Use the map function to get the length of each inputJPG record, and show the first 10
results.
9. Use the map function, splitting on space, to show the individual fields in each inputJPG
record, and show the first 10 results.
10. Starting with inputJPG, show just the IP address in each record and show the first 10
results.
11. Use the .foreach(println) technique to make the results in step 10 easier to read.
12. Save the results from step 10 in the HDFS file /loudacre/iplist.
13. Use the -ls and -cat command options (and optionally the Hue browser) to show the HDFS
directory /loudacre/iplist as well as the contents of the part file containing the IP addresses. 14. Write a program to display just the IP addresses and timestamps for all the records that contain jpg requests in the format IPAddress/Timestamp and show the first 10 results.
15. Turn in the lines of code you used for each step, and the output for each step (or a subset
of the output if it is too big).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started