Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Mining Lab Task. Your task for this assignment is to identify and characterize a dataset; practice some data preprocessing techniques; and explore the visualization

Data Mining Lab Task.

Your task for this assignment is to identify and characterize a dataset; practice some data preprocessing techniques; and explore the visualization and data preprocessing functionalities of Weka, an open source data mining toolkit in Java. 1. Identify and characterize a dataset a. You can select a dataset from a list of publicly available datasets at UCI Machine Learning Repository or the datasets section at Weka. You are also welcomed to explore datasets on your own from other sources. b. Briefly describe what the dataset is about and size of the dataset (e.g. number of tables, number of instances and attributes, etc.)

2. Data exploration and preprocessing a. Select one attribute and discuss appropriate measures of the central tendency and dispersion for the attribute. Use a subset of the attribute values (of your own choice) from the dataset and compute the mean, median, mode, range, quartiles, and variance for the attribute. b. Discuss data quality issues of the dataset. Are there (potential) problems with certain data attributes? What would be appropriate responses to these quality issues? c. Discuss one or two data preprocessing techniques that are likely required for the dataset. 3. Weka exploration a. Load the dataset into Weka, an open source data mining toolkit in Java. b. Explore the visualization and preprocessing functionalities (visualize, preprocess, and select attributes) using your dataset. Perform at least four experiments using different test options on two different types of classifier (Call them classifier A and B). You can choose A and B however you like -- they can be different classifiers (nearest-neighbor vs. decision trees) or the same classifier with different settings (e.g. different numbers of nearest neighbors). Discuss new insights you found from visualizing the data, the techniques you tested, and the results you obtained. c. Note: Weka uses a data format called ARFF. Most of the datasets in UCI repository can be downloaded under the datasets section at Weka website in ARFF format. However, for other datasets that represent raw problem domain data, you need to first translate it into the ARFF format

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions