Question
Data Analysis using R: Employee Attrition Dataset and Acme Dataset Location: Course Content - Datasets - EmployeeAttrition.csv and Acme.csv Tool: You need to use RStudio
Data Analysis using R: Employee Attrition Dataset and Acme Dataset Location: Course Content - Datasets - EmployeeAttrition.csv and Acme.csv Tool: You need to use RStudio for this assignment inside Virtual Box VM
- Open RStudio by clicking on blue circle R icon on the left side launch bar. This will open RStudio main screen.
- On the top menu, click on File > Open File and locate the Assignment-5-R-Data-Analytics-Source-File.R provided with the assignment.
- Also make sure to add the complete path of EmployeeAttrition.csv file inside source file. For example, if you store EmployeeAttrition.csv in Documents folder then your complete path should be
~/Documents/EmployeeAttrition.csv
- Press Source button (green arrow icon) inside RStudio window on top. This will compile and run your code.
- If you want to compile and run single line/statement in your code, press Run button (green arrow icon) inside RStudio window on top.
- If everything goes well, you will see output in blue or black color. If your code has any error, it will be written in red color.
- Write your remaining code in the same source file to answer all the questions below.
ANSWER ALL Questions BELOW:
Please answer the following questions
NOTES: Submit one R source code file of your code and output document (word or pdf) containing all the outputs. Make sure to print all outputs to questions in your R source code file so that when class TA runs your code in RStudio, it will show all the output for each question.
Please change your source code filename with your FullName-GNumber and submit to blackboard.
Part 1
- Use EmployeeAttrition.csv and write R code to find the following:
- Find the number of rows and columns in the dataset
- Find the maximum Age in the dataset
- Find the minimum DailyRate in the dataset
- Find the average/mean MontlyIncome in the dataset
- How many employees rated WorkLifeBalance as 1
- What percent of total employees have TotalWorkingYears less than equal to 5? Also calculate the percentage for TotalWorkingYears greater than 5
- Print EmployeeNumber, Department and MaritalStatus for those employees whose Attrition is Yes and RelationshipSatisfaction is 1 and YearsSinceLastPromotion is greater than 3
- Find the mean, median, mode, standard deviation and frequency distribution of EnvironmentSatisfaction for males and females separately. (Hint: For frequency distribution use table() function
Part 2
Acme Corporation is accused of gender bias in setting starting salaries for newly hired workers. The accompanying synthetic dataset lists recent salary data of new hires, showing college degree (BS, MS, PhD), gender (M, F), years of previous experience and starting salary (thousands of dollars).
- Use Acme.csv and write R code to find the following:
-
- Identify data types for each attribute in the dataset
- Produce a summary statistics for each attribute in the dataset
- Produce visualizations for each attribute (Hint: use hist() function)
- Display the relationship between
- Years of Experience and Starting Salary for all employees
- Years of Experience and Starting Salary for each gender
- Years of Experience and Starting Salary for each degree
(Hint: use Scatter Plots)
- Find the correlation between Starting Salary and Years of Experience?
- Is the correlation different for each gender?
- Is the correlation different for each degree?
- What can you conclude about Acme with respect to gender bias after your overall analysis?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started