Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This coursework is the only piece for the module. It takes the form of a data mining project. The project aims to provide an opportunity

This coursework is the only piece for the module. It takes the form of a data mining project.
The project aims to provide an opportunity for you to gain some experience in data mining practice, but more importantly through this experience to gain knowledge and better understanding of various techniques, algorithms and solutions towards data mining and machine learning. The coursework is meant to meet the intended learning outcomes of 2,3,4,5 and 6(please refer to the module specification document in the main folder for this module on the Teams).
The project consists of three stages: project preparation, project execution and finally presentation of the project deliverables. It is very much individual contribution-based. It can be done either individually or as a team of two or three. Ideally, a team works better than individuals in order to create an opportunity for critical evaluation of each others work and debating the best way forward. The scope of the project should reflect the amount of work expected from the number of people involved in the team. Normally, each person is expected to conduct a piece of data mining from the beginning to the end using at least two data mining and machine learning methods. Within the same application domain and problem context, each member of a team is set to either address one specific business objective or utilizing completely different approach in addressing the same objective.
Warning: please carefully control the scope of the project; you do have a proper opportunity to do a real-life individual project with a much larger and more realistic data set later in your programme. Remember that the module is worth 15 units of credit. The total number of study hours (including attending lessons) will be about 150 hours. The total number of hours on the project is about 45-50.
The data set, the software tools, libraries, languages and platforms for the project are entirely of your own choice. For ease of communication, members of the same team should try using the same tools and environment. For those who are comfortable with the programming, please consider using Python/scikit-learn on Jupyter Notebook. If you want to use an interactive tool, please consider Weka, RapidMiner or others.
1. Project Preparation
The first two main tasks are: (a) to form a team and (b) to select a suitable data set. Since you need time to form your team, you should perform both tasks simultaneously. You need to make up your mind almost immediately on whether you would do project alone or with someone. Please do not wait. Talk to your classmates now.
ATDMML/Module/Project/2022 Hongbo Du
It is perfectly understandable that you may want to do the project with real-life data from your own organizations or contacts, but this is NOT recommended because using real-life data requires a proper project proposal and the ethical approval, which will take time. Since we only have about 8 weeks (part-time) to finish the project, it is safer to use a public domain data set of a controlled size. Please see the Reference section for some suggested data sources [1][2], but please ensure that you read and comply with the terms and conditions for using the data.
There is NO perfect data set for the project. Since we are undertaking a mini project, please be realistic not too overambitious. A data set with thousands of rows (not hundreds of thousands) and tens of columns would be suitable. Of course, this only serves as a rough guide. If you have more people in your team, the data set can be large in terms of rows and columns.
Please complete the two tasks within one week after the coursework is given out. You need to inform me your team (with a nice team name?) and the data set to use.
You should then start preparing for the project by undertaking the following activities:
a) To study the project business background and understand the application domain where the data set comes from.
b) To familiarise with the process of data mining as well as the CRISP-DM methodology. We will cover the methodologies in our lecture, but you need reading into the topic.
c) To investigate the relevant literature and gain understanding on what data mining and machine learning have been used in the application domain, and what work has been done in this field. You should also identify one specific case relevant to the domain. This investigation will lead to a brief but comprehensive literature review in your project report.
d) To draw a project plan that makes sense over the working weeks within the completion date.
While you are performing the tasks above, please use your skills learnt from other modules such as the Research Methods or similar skills you already possess. Try your best to take notes in preparation for writing Background and Literature Review sections of your project report (see later). Forming a good team and working out an effective way of communication are very impo

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Database Experts Guide To SQL

Authors: Frank Lusardi

1st Edition

0070390029, 978-0070390027

More Books

Students also viewed these Databases questions