Question
I need help with my Pig Latin Homework with Hadoop. Please write the commands you used for each step and screenshots for when you dumped
I need help with my Pig Latin Homework with Hadoop. Please write the commands you used for each step and screenshots for when you dumped the table!
Using Pig Latin to Analyze Log with Hadoop
Write Pig Latin statements to analyze an application log file, and run various queries on the data to generate output. First, you will use Pig Latin in interactive mode (Grunt shell) to analyze a single log file, and then you will use Pig in batch mode (script) to perform the same task.
The input file consists of a semi-structured log4j file in the following format: The entire dataset is over 1,000 lines.
. . . . . . . . . .
2012-02-03 20:26:41 SampleClass3 [TRACE] verbose detail for id 1527353937
java.lang.Exception: 2012-02-03 20:26:41 SampleClass9 [ERROR] incorrect format for id 324411615
at com.osa.mocklogger.MockLogger$2.run(MockLogger.java:83)
2012-02-03 20:26:41 SampleClass2 [TRACE] verbose detail for id 191364434
2012-02-03 20:26:41 SampleClass1 [DEBUG] detail for id 903114158
2012-02-03 20:26:41 SampleClass8 [TRACE] verbose detail for id 1331132178
2012-02-03 20:26:41 SampleClass8 [INFO] everything normal for id 1490351510
2012-02-03 20:32:47 SampleClass8 [TRACE] verbose detail for id 1700820764
2012-02-03 20:32:47 SampleClass2 [DEBUG] detail for id 364472047
2012-02-03 20:32:47 SampleClass7 [TRACE] verbose detail for id 1006511432
2012-02-03 20:32:47 SampleClass4 [TRACE] verbose detail for id 1252673849
2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 881008264
2012-02-03 20:32:47 SampleClass0 [TRACE] verbose detail for id 1104034268
2012-02-03 20:32:47 SampleClass6 [TRACE] verbose detail for id 1527612691
java.lang.Exception: 2012-02-03 20:32:47 SampleClass7 [WARN] problem finding id 484546105
at com.osa.mocklogger.MockLogger$2.run(MockLogger.java:83)
2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 2521054
2012-02-03 21:05:21 SampleClass6 [FATAL] system problem at id 1620503499
. . . . . . . . . . . . . .
The output data will be put into a file showing the various log4j log levels along with its frequency occurrence in the input file. A sample of these metrics is displayed below:
[TRACE] 8
[DEBUG] 4
[INFO] 1
[WARN] 1
[ERROR] 1
[FATAL] 1
Task 1 Using Pig in Interactive Mode:
Step 1: Create a new directory called piglab in /home/training
Step 2: Download sample.log file and put in /home/training/piglab
Step 3A: Copy sample.log into HDFS.
Step 3B: Enter Pig interactive command mode.
Step 4A: Load the sample.log file that you want to manipulate and give it the alias LOGS
Step 4B: Dump the result. Step 5A: Go through each line and find a match on the 6 log levels and give it the alias LEVELS.
Step 5B: Dump the LEVELS Step 6A: Filter out rows that do not have a match (for example, empty rows) and give it the alias FILTERED_LEVELS.
Step 6B: Dump the FILTERED_LEVELS, which is an alias, that holds these words from each record TRACE, DEBUG, INFO, WARN, ERROR & FATAL and removes any NULL words.
Step 7A: Group all of the log levels into their own row and give it the alias GROUPED_LEVELS.
Step 7B: Now if you dump LOG_LEVELS alias, all of the Log levels are grouped together.
Step 8A: For each group, count the occurrences of log levels (these will be the frequencies of each log level) and give it the alias FREQUENCIES.
Step 8B: Dump FREQUENCIES alias to see the count of the occurrences of each word in a group.
Step 9A: Sort the frequencies in descending order and give it the alias RESULTS
Step 9B: Display job results. This step will take a few minutes.
Task 2: Use Pig in Batch Mode
Use Pig in batch mode by creating a Pig script made up of the same Pig commands you used in the last task.
Step 1: Exit Pig interactive mode.
Step 2: Verify that you are still in the piglab directory that contains the sample.log input file.
Step 3: Create the Pig script by creating a logscript.pig file (use nano, emacs or vi editor.)
Step 4: Copy and paste the Pig commands from Task 1in the logscript.pig file, and save.
Step 5: Run the newly-created Pig script.
Step 6: View the output file. The results are the same as interactive mode.
After each command, you can display the resulting output to the screen to view changes in the data.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started