Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I need help with my Pig Latin Homework with Hadoop. Please write the commands you used for each step and screenshots for when you dumped

I need help with my Pig Latin Homework with Hadoop. Please write the commands you used for each step and screenshots for when you dumped the table!

Using Pig Latin to Analyze Log with Hadoop

Write Pig Latin statements to analyze an application log file, and run various queries on the data to generate output. First, you will use Pig Latin in interactive mode (Grunt shell) to analyze a single log file, and then you will use Pig in batch mode (script) to perform the same task.

The input file consists of a semi-structured log4j file in the following format: The entire dataset is over 1,000 lines.

. . . . . . . . . .

2012-02-03 20:26:41 SampleClass3 [TRACE] verbose detail for id 1527353937

java.lang.Exception: 2012-02-03 20:26:41 SampleClass9 [ERROR] incorrect format for id 324411615

at com.osa.mocklogger.MockLogger$2.run(MockLogger.java:83)

2012-02-03 20:26:41 SampleClass2 [TRACE] verbose detail for id 191364434

2012-02-03 20:26:41 SampleClass1 [DEBUG] detail for id 903114158

2012-02-03 20:26:41 SampleClass8 [TRACE] verbose detail for id 1331132178

2012-02-03 20:26:41 SampleClass8 [INFO] everything normal for id 1490351510

2012-02-03 20:32:47 SampleClass8 [TRACE] verbose detail for id 1700820764

2012-02-03 20:32:47 SampleClass2 [DEBUG] detail for id 364472047

2012-02-03 20:32:47 SampleClass7 [TRACE] verbose detail for id 1006511432

2012-02-03 20:32:47 SampleClass4 [TRACE] verbose detail for id 1252673849

2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 881008264

2012-02-03 20:32:47 SampleClass0 [TRACE] verbose detail for id 1104034268

2012-02-03 20:32:47 SampleClass6 [TRACE] verbose detail for id 1527612691

java.lang.Exception: 2012-02-03 20:32:47 SampleClass7 [WARN] problem finding id 484546105

at com.osa.mocklogger.MockLogger$2.run(MockLogger.java:83)

2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 2521054

2012-02-03 21:05:21 SampleClass6 [FATAL] system problem at id 1620503499

. . . . . . . . . . . . . .

The output data will be put into a file showing the various log4j log levels along with its frequency occurrence in the input file. A sample of these metrics is displayed below:

[TRACE] 8

[DEBUG] 4

[INFO] 1

[WARN] 1

[ERROR] 1

[FATAL] 1

Task 1 Using Pig in Interactive Mode:

Step 1: Create a new directory called piglab in /home/training

Step 2: Download sample.log file and put in /home/training/piglab

Step 3A: Copy sample.log into HDFS.

Step 3B: Enter Pig interactive command mode.

Step 4A: Load the sample.log file that you want to manipulate and give it the alias LOGS

Step 4B: Dump the result. Step 5A: Go through each line and find a match on the 6 log levels and give it the alias LEVELS.

Step 5B: Dump the LEVELS Step 6A: Filter out rows that do not have a match (for example, empty rows) and give it the alias FILTERED_LEVELS.

Step 6B: Dump the FILTERED_LEVELS, which is an alias, that holds these words from each record TRACE, DEBUG, INFO, WARN, ERROR & FATAL and removes any NULL words.

Step 7A: Group all of the log levels into their own row and give it the alias GROUPED_LEVELS.

Step 7B: Now if you dump LOG_LEVELS alias, all of the Log levels are grouped together.

Step 8A: For each group, count the occurrences of log levels (these will be the frequencies of each log level) and give it the alias FREQUENCIES.

Step 8B: Dump FREQUENCIES alias to see the count of the occurrences of each word in a group.

Step 9A: Sort the frequencies in descending order and give it the alias RESULTS

Step 9B: Display job results. This step will take a few minutes.

Task 2: Use Pig in Batch Mode

Use Pig in batch mode by creating a Pig script made up of the same Pig commands you used in the last task.

Step 1: Exit Pig interactive mode.

Step 2: Verify that you are still in the piglab directory that contains the sample.log input file.

Step 3: Create the Pig script by creating a logscript.pig file (use nano, emacs or vi editor.)

Step 4: Copy and paste the Pig commands from Task 1in the logscript.pig file, and save.

Step 5: Run the newly-created Pig script.

Step 6: View the output file. The results are the same as interactive mode.

After each command, you can display the resulting output to the screen to view changes in the data.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions