Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Often when you get a dataset, it is not in the exact format you want / need . So , you have to refine the
Often when you get a dataset, it is not in the exact format you wantneed So you have to refine the dataset into something more useful this is often called data munging. In this lab exercise, you will read in a dataset and work on that in a dataframe Then, you will explore the distribution within the dataset.
An interactive tutorial on basic to advanced R features is also available within R itself. It is called Statistics With Interactive R Learning SWIRL More information about installing and using this feature can be found at http:swirlstatscomstudentshtml
Review the ReadingsResources in the unit. Complete the following steps:
Create a function named readStates to read a CSV file into R
You need to read a URL, not a local file to your computer.
The file is a dataset on state populations within the United States
The URL is:https:wwwcensus.govprogramssurveyspopesttablesstatetotalsnstestcsv Note that you might need to use https: rather than http:
Clean the dataframe
Note the issues that need to be fixed removing columns, removing rows, changing column names
Within your function, make sure there are rows one per state the district of Columbia
Make sure there are only columns with the columns having the following names stateName Census, Estimates, Pop Pop
Make sure the last four columns are numbers ie not strings
Store and explore the dataset
Store the dataset into a dataframe, called dfStates.
Test your dataframe by calculating the mean for the data, by doing: meandfStates$Pop
Find the state with the highest population
Based on the data, what is the population of the state with the highest population? What is the name of that state?
Sort the data, in increasing order, based on the data.
Explore the distribution of the states
Write a function that takes two parameters. The first is a vector and the second is a number.
The function will return the percentage of elements within the vector that is less than the same ie cumulative distribution below the value provided For example, if the vector had elements with being the number passed into the function, the function would return since of the numbers were below
Test the function with the vector dfStates$Pop and the mean of dfStates$Pop
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started