Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task 1 Problem Formulation, Data Acquisition and Preparation, Exploration and Modelling ( 5 0 % ) The Canvas assignment page will specify datasets that can

Task 1 Problem Formulation, Data Acquisition and Preparation, Exploration and Modelling (50%) The Canvas assignment page will specify datasets that can be used for this task. Decide which one you are going to use; this will be based on what you have used in the past (dataset usage cannot be repeated) and the domains you are interested in.1.1 Problem Formulation, Data Acquisition and Preparation You need to load the data, perform necessary and appropriate data preparation operations to facilitate the subsequent data analysis and modelling. Note: If multiple data files exist in the Data Folder, you may just choose one of them which you believe is the most appropriate one to work on. Furthermore, feature engineering might need to be performed in the step of data preparation. You must describe your workflow (including the involved key components) for completing this task, present key observations and analyses, provide justifications of any choices you have made, and discuss any issues if encountered (including the ways you have used to address them) in the video required in Task 2.1.2 Exploration Pose 3 meaningful questions to be answered with the dataset chosen. These can involve single or multiple features/variables. You should document your analysis steps and answers to the questions in the notebook file. (There is no separate report file for this assignment.) It is up to you to clearly present each question you are answering and then your answer. For example (suggestion), the image below shows using a markdown cell and a level 3 heading to state one of the questions, then in a python cell with a code comment to explain (if needed) what will be done and then code to (try to) get the answer. Clearly state your answer so a reader can easily identify it (as part of code output is preferred). You must state the question, describe the way youve used to find its answer, report key observations based upon numeric metrics (e.g., descriptive/inferential statistics) and/or graphical visualisations, and present any interesting takeaways in the video required in Task 2.1.3 Modelling Create an 80:20 split for training and testing, use a set random_state value so that it is repeatable. Select, train and evaluate (test) different models and present a comparison of the results: 1. Select 2 Learning Algorithms (L): Select two (2) different learning algorithms from the scikit-learn package that might be appropriate for modelling the dataset you have selected. For example, you might select K-Nearest Neighbours (K-NN) and Artificial Neural Networks (ANNs). Both have various configuration options that could be adjusted. 2. Determine hyperparameter values: Use cross validation or a selection of parameter values for training, and measure validation performance using appropriate metrics. Present the results in a table or chart so that it is easy to identify which model worked best. 3. Select the best and evaluate: Select the algorithm + configuration approach that worked the best based on the data you collected. Assess the trained model using the test data that the model has not seen before. You will need metrics to compare

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

1. Why do people tell lies on their CVs?

Answered: 1 week ago

Question

2. What is the difference between an embellishment and a lie?

Answered: 1 week ago