Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

Major tasks required for the project: Step 1: Obtaining a dataset The first step is to find your own domain-specific dataset for your statistical analysis

image text in transcribedimage text in transcribed
Major tasks required for the project: Step 1: Obtaining a dataset The first step is to find your own domain-specific dataset for your statistical analysis project. There is no restrictions on the data set. Rather it is good and useful data that matters. A good data typically contains various types of data (numerical, nominal, ordinal, Boolean etc) with some errors (missing or dirty values etc) in the data. The dataset could be text data, tabular formatted data, georeferenced data, etc. Also, the dataset could be your own data or could be obtained through public sites. Simply the data could be: your own data (obtained or created by yourself); public data from the UCI machine learning repository (http://archive.ics.uci.edu/ml/) and Kaggle repository (https:/www.kaggle.com/datasets). Step 2: Setting up a business scenario Once obtained a dataset, then set up a real-like business scenario what kinds of patterns you want to find from the dataset. For instance, if you have chosen a set of crime incidents in town, then you might be interested in finding what crimes are occuring together, and which particular crime is frequently occurring near pub after midnight, what crimes sequentially occurring after a certain crime etc. Your business scenario as a police officer would be to find crime hot spots, or sequential crimes occurring one after another, or periodic crime occurrences. Another example would be, if you have chosen a retail sales dataset, then you, as a sales manager, would be interested in finding associative patterns between age group and certain item (i.e. a young student tends to buy a jean), or the correlative pattern between geographical location and certain item. For instance, more hats are sold in Smithfield whilst more shoes are sold in Douglas. Your business scenario for this case would be as a sales manager, you would like to find associative or correlative patterns that could be used to boost the sales or could be used to optimise stock management. Depending on the business scenario (goal), you can focus on certain types of patterns to achieve your business goal. Step 3: Planning statistical analysis Further explore (browse) the dataset to decide what patterns you would like to focus on, what statistical techniques you plan to use, what preprecessing is required to use those algorithms. This is a core part of analysis and you have to use the right technique to find right pattern. Also, you have to apply proper preprocessing approaches before you use the adequate statistical techniques. * Note: The procedural order of the above three steps can be alternated. For example, you may find an interesting business scenario first and then find a suitable dataset that fits for the analysis on the scenario chosen. Or you could decide the pattern first ( for instance crime hot spots), and then find a dataset to set up a reasonable business scenario. ** Step 4: Pre-processing stage In this case, you could transform and pre-process your data to make it suitable for statistical analysis. Please note that, you could select as many statistical techniques (not confined to one) to achieve your Page 8 of 16 goal as you set in your business scenario. PLEASE note that, use techniques covered in the class and do not use those not covered in the class. Step 5: Analysis and discussion stage Do various statistical analysis practices with the dataset by applying different approaches and/ or preprocessing approaches. Critically analyse the patterns found, and discuss the findings. You could also compare/contrast existing algorithms to see which one performs better with your dataset. For instance, for your dataset (with a certain preprocessing approach], you could compare and contrast the performance of varous generalised linear approaches. These methods include those taught in class: . Linear regression models Generalised linear models . Poisson models Survival models Time series models Zero inflated models List all the interesting pattems you find to answer your business goals. Step 6: Writing a statistical analysis report You (your group) need to write a scientific report of 10-15 pages in length (around 3000-5000 words) without references on your project that summarises your (implemented or chosen] algorithm and findings. The report must follow the generally accepted format consisting of title, introduction, data description, business scenario (goall, preprocessing and statistical analysis used, confidence levels and findings discussions and comparison, and conclusion including possible future work and a list of references (APA referencing style). (you may add more sections if needed) PLEASE do not excessively explain about the algorithms areas you use (but briefly explain, a short paragraph would be fine], but focus more on your methodological/structural/analytical approaches, and interesting patterns you find. Useful links for data sets: http://www.kdnuggets.com/ http://www.cs.waikato.ac.nz/ml/weka/ http://mlearn.ics.uci.edu/MLRepository.html . . . http://kdd.ics.uci.edu/ http://www.sigkdd.org/ https://www.kaggle.com/ RapidMiner (https://rapidminer.com/} Page 9 of 16

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pell And Pell–Lucas Numbers With Applications

Authors: Thomas Koshy

1st Edition

1461484898, 9781461484899

More Books

Students explore these related Mathematics questions