Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The Data Set (This Dataset is available on Kaggle.com) The data for the project comes from the Global Health Observatory (GHO) data repository (under the
The Data Set (This Dataset is available on Kaggle.com)
The data for the project comes from the Global Health Observatory (GHO) data repository (under the World Health Organization) maintains. It gives data on life expectancy and associated health factors across 193 countries for the years 2000-2015. The data frame's dimension is 1649 by 17. (PS. I'm not sure how to attach dataset file here)
Please provide R programming codes and provide discussion details on the following bullet points:
- Checking for missing values in the data and if any are found describe and implement a plan for dealing with any missing values
- Examining the distribution shapes for the numerical variables graphically and numerically
- Investigating pairwise relationships between variables. The ggpairs function in the GGally library provides a very nice graphical display for this. It requires the ggplot2 library for its functionality
- After your data investigation split the data into two portions, one for training, the other for testing. Choose your own percentages for the split. You'll use these two sets for all model creation and testing to come
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started