Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this assessment, you are required to use Weka 3.8.3 (or a later version available athttps://www.cs.waikato.ac.nz/ml/weka/downloading.html), you will use this throughout the duration of this

For this assessment, you are required to useWeka 3.8.3(or a later version available athttps://www.cs.waikato.ac.nz/ml/weka/downloading.html), you will use this throughout the duration of this subject.You will also need to use a text editor such as notepad for windows system or Textedit for Mac.

Task 1: Create. expore ARFF data for Weka [30 marks]

In this task you are expected to convert a text file into an ARFF file for Weka. The text file you will be using contains a sample of real life data related to parking fines in Australia. You are then asked to explore the data using Weka. Below are the specific task requirements.

  1. Download the text file calledParkingFines.csvand open it using a text editor such as Notepad (Windows) or TextEdit (Mac). The file ParkingFines.csv has been partially formatted as an ARFF file. Identify any errors in the file and complete the formatting to obtain a valid ARFF file saved as ParkingFines.arff. Itemise any errors identified and include a screenshot of your corrected ARFF file to support the itemised errors identified as part of your submission. [20 marks]
  2. Explore the ParkingFines.arff file you just created in Weka using Weka Explorer and answer the following questions. Make sure to include screenshot of the visualisations to support your answers.
  3. What proportion of people who committed the offence "Contravene No Stopping" actually paid their fine?[5mrks]
  4. What proportion of people who were fined $50 were exempted from paying the fine?[5mrks]

Task 2:Explore and Analyse adult.arff data using Weka [35 marks]

In this task you will explore the adult.arff dataset using Weka Explorer. The adult dataset which comes as part of the Weka installation is a dataset containing various attributes of individuals obtained through a census of people living in the US. The dataset was curated to be used to build a model that can predict whether or not an individual will earn greater than $50k based on his/her other attribute values. Load the adult.arff data file available in Weka andanswer the following questions with justifications and screenshots.

  1. With the aid of a visualisation, identify the most populous age bracket?[5 mrks]
  2. With the aid of visualisations compare the distribution of the female population in this dataset (adult.arff) to Australia's female population distribution in 2019 as shown in the image below obtained from the Australian Bureau of Statistics (ABS). The distribution shown in the image reflects the entire population distribution of both females and males regardless of their income. Briefly discussany twosimilarities/differences between the age distribution of females in the adult.arff dataset and the 2019 distribution of females in Australia .[15mrks ]
  3. With the aid of visualisations justify whether you agree or disagree with the following statement:From the adult.arff dataset, there are more men who earn less than 50K than there are women who earn less than 50k[15 mrks]

Task 3: Decision Tree Analysis [35 marks]

  1. The table below shows a dataset for a binary class problem. By using information gain, justifywith calculations which attribute (A or B) the decision tree algorithm will choose to split on.[20 mks]
  2. Explain whether you think gain ratio could be a better metric for this example or not.[15 mrk]

RATIONALE

back to top

This assessment task will assess the following learning outcome/s:

  • be able to identify and analyse business requirements for the identification of patterns and trends in data sets.
  • be able to appraise the different approaches and categories of data mining problems.
  • be able to compare and evaluate output patterns.
  • be able to explore and critically analyse data sets and evaluate their data quality, integrity and security requirements.
  • be able to compare and evaluate appropriate techniques for detecting and evaluating patterns in a given data set.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

An Introduction to Measure Theoretic Probability

Authors: George G. Roussas

2nd edition

128000422, 978-0128000427

More Books

Students also viewed these Mathematics questions

Question

i. Radio imaging vs. luminescent imaging

Answered: 1 week ago