Question: Answer the following questions based on statistical analysis of the data provided in cancer.csv. (https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-pollution-a-new-link) Using R language 1- Derive descriptive statistics regarding this dataset,
Answer the following questions based on statistical analysis of the data provided in cancer.csv. (https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-pollution-a-new-link)
Using R language
1- Derive descriptive statistics regarding this dataset, including measures of central tendency for the following fields.
a. Age
b. Air Pollution
c. Alcohol use
d. Smoking
2-The cancer .csv file contains few missing entries in the file. Find out how many rows have the missing values. How would you deal with these missing values for your analysis?
3-Find the effect of following factors on the Level of the cancer and justify your answer. Represent the correlation graphically.
Alcohol use
Genetic Risk
Balanced Diet
Smoking
Shortness of Breath
4-What is the probability of having cancer level = High when
- The level of passive smoking is more than equal to 5.
- The level of dust allergy is less than 5.
5- Is there any association between Alcohol use and Smoking? Justify your answer. (0.5
6- Is there any outlier for the following?
-Weight Loss
-Obesity
Justify your answer systematically.
7- Analyze the Frequent Cold: and Dry Cough for male and female. Depict graphically which category (Male or Female) is more prone to for these symptoms?
8- Explain graphically which age group is snoring more? You can create your own age group.
Step by Step Solution
There are 3 Steps involved in it
Here is how you can address each of the questions using R ... View full answer
Get step-by-step solutions from verified subject matter experts
