Question
CIS 372 Programming for AnalyticsFinal ProjectIn this project, you will study on a dataset that you will select. Once you select your dataset, try to
CIS 372 Programming for AnalyticsFinal ProjectIn this project, you will study on a dataset that you will select. Once you select your dataset, try to understand your data and decide on the main question that you will search in this analysis. Then please let me know through email to make sure if the dataset and the research
question are appropriate.
The project is an individual one. The project grade will be based on the quality of each component of your work. Evaluation of the reports is based on the following criteria: technical soundness, organization, and clarity.
Note: Figures and tables that are presented in your report should have description and discussion. To earn full credit, you must describe what each table/figure is showing and discuss any key takeaways. In other words, it is not sufficient to simply display R output. You must also provide thoughtful discussion of the output. Otherwise, you will receive at most half credit. Important dates:
Dataset submission: 5th NovemberProject report submission: 28th NovemberProject presentation: 29th November/ 4th December
Project requirements
Your end-product for the project will be a report that contains at least the following sections:
1. Data summary (30 points)
1.1. Summarize your data
You should begin by describing the data you have available. View your data, look at the structure, variable names, data types, check if there is any missing variable, etc. 1.2. Tidy your data
Try to obtain a clean, analyzable dataset by using the functions you have learnt over the semester. You will want to display tabular summaries where appropriate. According to your main question, you are expected to do all arrangements.
1.3. Visualize your data
You are expected to provide insightful graphical summaries of data with proper labeling of axes.
Your score for this section will be based on the following criteria:
Meaningful variable names
Insightful graphical and tabular summaries of the data
Proper labelling of figure axes and table columns
Discussion of the graphical and tabular summaries.
2. Data mining (30 pts)
As part of your analysis, you are expected to do data mining analysis by using regression and assess/validate your model. When running regressions, you will need to verify the models quality by running diagnostic checks and discuss the diagnostic plots. Estimate mode accuracy, use k-fold cross validation and discuss the results.
You may also need to do other analysis (classification, etc.). It depends on your research questions/sub questions. So that you should provide the necessary information, main results and validate your model.
Your score for this section will be based on the following criteria:
Using appropriate techniques
Insightful graphical and tabular summaries of the analysis
Discussion of the graphical and tabular summaries
Providing validation
3. Methodology (20 points)
In this section you should provide an overview of the approach you took to exploring and analyzing the data. This is where you tell the story of how you got to your main findings and explain the various types of analyses that you tried.You should address at least the following questions:
How did you deal with the variables? What impact does your approach have on the interpretation or generalizability of the resulting analysis?
How did you deal with missing values? What impact does your approach have on the interpretation or generalizability of the resulting analysis?
Did you produce any tables or plots that you thought would reveal interesting trends but didnt?
Whats the analysis that you finally settled on? Which factors related with your main question do you investigate in the final analysis?
4. Findings (15 points)
In this section you give a careful presentation of your main findings concerning the main problem. You should provide, where appropriate:
Tabular summaries (with carefully labelled column headers)
Graphical summaries (with carefully labelled axes, titles, and legends)
Regression output + interpretation of output + interpretation of coefficients
Assessments of statistical significance (output of tests, models, and corresponding p- values)
Other data mining analysis results
Validation of the models
5. Discussion (5 points)
In this section you should summarize your main conclusions. You should also discuss potential limitations of your analysis and findings.You should also address the following question: How much confidence do you have in your analysis? Do you believe your results? Are you confident enough in your analysis and findings to present them to policy makers?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started