Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Overview The Final Project is the capstone hands-on/applied deliverable for this course. It represents a synthesis of semester learnings. This project will allow you

1. Overview The Final Project is the capstone "hands-on/applied" deliverable for this course. It represents a synthesis of semester learnings. This project will allow you to apply what you have learned within the class to a real- world data problem. For the project, you will work in teams of either 2 or 3 students on a problem of your choosing that is interesting, significant, and relevant to Data Science. Your team's task for the project is to understand the domain and the data available to provide insight into analytical results. At the end of the semester, you will present your group's work together and submit a project report. The project you submit must represent the original work completed this semester.

2. Selection of Topic and Data Choose a topic of interest to your group and carry out a cohesive, complete project based around it. The range of possible topics that you can choose from is broad. However, the project you pick should incorporate a wide range of data science techniques. The final project focus/subject area/data set(s) should be reviewed and approved by the instructor. In order for you to have the greatest chance of success with this project, it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored or a target variable can be predicted with predictor variables. As such, your dataset must have at least 200 observations and at least six variables (exceptions can be made, but you must speak with me first). It would be even better if the variables included different types of variables such as categorical, quantitative, and character variables. All analyses must be done in RStudio.

Some publicly available data sources: https://opendata.cityofnewyork.us/ http://www.data.gov https://www.kaggle.com/datasets https://github.com/awesomedata/awesome-public-datasets/blob/master/README.rst https://dev.socrata.com/data/ https://datasetsearch.research.google.com/ https://www.pewresearch.org/download-datasets/ https://github.com/rfordatascience/tidytuesday https://www.gapminder.org/data/ https://dataverse.harvard.edu/ https://datahub.io/

4. Analysis The goal is to apply appropriate data science tools to obtain insight and provide business decisions. The goal is not to do an exhaustive data analysis i.e., do not calculate every statistic and procedure you have learned for every variable or visualize all the data at once, but rather let me know that you are proficient at using R, and that you are proficient at interpreting and presenting the results. Focus on methods that help you begin to answer your research questions. Make sure to consider all stages of a typical modeling process, from gathering, cleaning, and preprocessing relevant data over algorithm and model selection to model deployment, assessment, and possibly revision. You are free to go beyond the standard methods taught in class to do a deep analysis. The project is very open-ended; remember, there are many different packages that you can utilize in R. 5. Project Report Content Business question or questions to be answered/addressed o What is your topic and why is it relevant/important to you? Data preparation o Acquisition: How did you obtain the raw dataset? o Variables: Define the variables and their data types o Cleanse & Transform: How did you clean and transform your data for the analysis? Descriptive analysis o Describe your data with summary statistics and visualizations Perform only one additional analysis of your choice: o Statistical tests (t-test, ANOVA, etc.) o Linear Regression o Mapping, geocoding, and ggplot2 visualizations with aesthetics o Logistic regression o Text mining o Association Rule Mining Conclusion o Interpretation of results relative to addressing the business question(s). o Recommended business actions based on your assessment o Discussions and future work: A summary of what you have learned, a critique of your own methods and suggestions to improve your analysis, what you would do differently if you were able to start over with the project, or what you would do next if you were going to continue work on the project, etc. Supplementary files o Dataset as a CSV file o R code script file associated with the whole project

1. the topic I want you to help me with is Linear Regression.

2. the dataset I found "Healthcare Insurance" you can find the dataset on https://www.kaggle.com/datasets.

so answer this using the requirement above.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential Calculus Early Transcendental Functions

Authors: Ron Larson, Robert P. Hostetler, Bruce H. Edwards

1st Edition

618879188, 618879182, 978-0618879182

More Books

Students also viewed these Mathematics questions

Question

=+a) In this context, what is meant by the power of the test?

Answered: 1 week ago