Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

CIS 372 Programming for AnalyticsFinal ProjectIn this project, you will study on a dataset that you will select. Once you select your dataset, try to

CIS 372 Programming for AnalyticsFinal ProjectIn this project, you will study on a dataset that you will select. Once you select your dataset, try to understand your data and decide on the main question that you will search in this analysis. Then please let me know through email to make sure if the dataset and the research

question are appropriate.

The project is an individual one. The project grade will be based on the quality of each component of your work. Evaluation of the reports is based on the following criteria: technical soundness, organization, and clarity.

Note: Figures and tables that are presented in your report should have description and discussion. To earn full credit, you must describe what each table/figure is showing and discuss any key takeaways. In other words, it is not sufficient to simply display R output. You must also provide thoughtful discussion of the output. Otherwise, you will receive at most half credit. Important dates:

Dataset submission: 5th NovemberProject report submission: 28th NovemberProject presentation: 29th November/ 4th December

Project requirements

Your end-product for the project will be a report that contains at least the following sections:

1. Data summary (30 points)

1.1. Summarize your data

You should begin by describing the data you have available. View your data, look at the structure, variable names, data types, check if there is any missing variable, etc. 1.2. Tidy your data

Try to obtain a clean, analyzable dataset by using the functions you have learnt over the semester. You will want to display tabular summaries where appropriate. According to your main question, you are expected to do all arrangements.

1.3. Visualize your data

You are expected to provide insightful graphical summaries of data with proper labeling of axes.

Your score for this section will be based on the following criteria:

Meaningful variable names

Insightful graphical and tabular summaries of the data

Proper labelling of figure axes and table columns

Discussion of the graphical and tabular summaries.

2. Data mining (30 pts)

As part of your analysis, you are expected to do data mining analysis by using regression and assess/validate your model. When running regressions, you will need to verify the models quality by running diagnostic checks and discuss the diagnostic plots. Estimate mode accuracy, use k-fold cross validation and discuss the results.

You may also need to do other analysis (classification, etc.). It depends on your research questions/sub questions. So that you should provide the necessary information, main results and validate your model.

Your score for this section will be based on the following criteria:

Using appropriate techniques

Insightful graphical and tabular summaries of the analysis

Discussion of the graphical and tabular summaries

Providing validation

3. Methodology (20 points)

In this section you should provide an overview of the approach you took to exploring and analyzing the data. This is where you tell the story of how you got to your main findings and explain the various types of analyses that you tried.You should address at least the following questions:

How did you deal with the variables? What impact does your approach have on the interpretation or generalizability of the resulting analysis?

How did you deal with missing values? What impact does your approach have on the interpretation or generalizability of the resulting analysis?

Did you produce any tables or plots that you thought would reveal interesting trends but didnt?

Whats the analysis that you finally settled on? Which factors related with your main question do you investigate in the final analysis?

4. Findings (15 points)

In this section you give a careful presentation of your main findings concerning the main problem. You should provide, where appropriate:

Tabular summaries (with carefully labelled column headers)

Graphical summaries (with carefully labelled axes, titles, and legends)

Regression output + interpretation of output + interpretation of coefficients

Assessments of statistical significance (output of tests, models, and corresponding p- values)

Other data mining analysis results

Validation of the models

5. Discussion (5 points)

In this section you should summarize your main conclusions. You should also discuss potential limitations of your analysis and findings.You should also address the following question: How much confidence do you have in your analysis? Do you believe your results? Are you confident enough in your analysis and findings to present them to policy makers?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Databases On The Web Learn To Create Web Pages That Interface With Database Engines

Authors: Robert Papaj, Donald Burleson

11th Edition

1576100995, 978-1576100998

More Books

Students also viewed these Databases questions