Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Big Data Analytics using Hadoop and Spark UEL - CN - 7 0 3 1 Big Data Analytics Understanding Dataset: UNSW - NB 1 5

Big Data Analytics using Hadoop and Spark
UEL-CN-7031 Big Data Analytics
Understanding Dataset: UNSW-NB15
a) The dataset consists of 49 features with a class label. The features are network traffic attributes such as source IP, destination IP, source port, destination port, protocol, timestamp, and other flow-based features. These features help in understanding and characterizing the network traffic and identifying potential attacks.
b) The dataset consists of nine types of attacks: Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. Each attack type has several subcategories that further classify the attack behavior and characteristics.
c) To explore the dataset and understand its features, you can import it into Hadoop HDFS and use a Hive query to display the first few records.
Big Data Query & Analysis by Apache Hive [30 marks]
Analyze the dataset and create at least four Hive queries.
Use visualization tools to present your findings both numerically and graphically.
Interpret your findings and take screenshots of the results (tables and plots) along with the scripts/queries for the report.
Advanced Analytics using PySpark [50 marks]
3.1. Analyze and Interpret Big Data (15 marks)
Conduct analysis using at least four analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.).
Present your work numerically and graphically with tooltips, legends, titles, and X-Y labels to aid end-users.
3.2. Design and Build a Classifier (35 marks)
a) Build a binary classifier for the dataset, explaining the algorithm and its configuration. Present your findings both numerically and graphically. Evaluate the performance and verify the accuracy and effectiveness of your model. [15 marks]
b) Apply a multi-class classifier to classify data into ten classes: one normal and nine attacks. Briefly explain your model with supportive statements on its parameters, accuracy, and effectiveness. [20 marks]
Individual Assessment [10 marks]
Discuss alternative technologies for tasks 2 and 3 and their differences (using academic references).
Reflect on what was surprisingly new thinking evoked and/or neglected at your end.
Documentation [10 marks]
Document all your work, following the format of the final submission section.
Demonstrate appropriate understanding of academic writing and integrity.
Explanation:
Here's a brief explanation of all the steps in the Big Data Analytics assignment:
Understanding Dataset: UNSW-NB15
Familiarize yourself with the dataset's features, attack types, and sub-categories.
Import the dataset into Hadoop HDFS, and use a Hive query to display the first few records for better understanding.
Big Data Query & Analysis by Apache Hive
Analyze the dataset and create at least four Hive queries that provide useful insights.
Use visualization tools to present your findings numerically and graphically.
Interpret your findings and include screenshots of the results, along with the scripts/queries, in your report.
Advanced Analytics using PySpark
3.1. Analyze and Interpret Big Data
Apply at least four analytical methods to understand the dataset (descriptive statistics, correlation, hypothesis testing, density estimation, etc.).
Present your work numerically and graphically, using tooltips, legends, titles, and X-Y labels.
3.2. Design and Build a Classifier
a) Build a binary classifier, explaining the algorithm and its configuration. Evaluate the performance and verify the model's accuracy and effectiveness.
b) Apply a multi-class classifier to classify data into ten classes, including one normal and nine attacks. Explain your model, its parameters, and its accuracy and effectiveness.
Individual Assessment
Discuss alternative technologies for tasks 2 and 3 and their differences, using academic references.
Reflect on what was surprisingly new thinking evoked and/or neglected at your end.
Documentation
Document all your work, following the format of the final submission section.
Ensure your report demonstrates an appropriate understanding of academic writing and integrity.
By following these steps, you will complete the Big Data Analytics assignment, which includes understanding the dataset, conducting analysis and modeling using Apache Hive and PySpark, evaluating alternative technologies, and documenting your work in a clear, structured, and professional manner.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

2. What are the different types of networks?

Answered: 1 week ago