Often when you get a dataset, it is not in the exact format you want need So , you have to refine the dataset into something more useful this is often called data munging In this lab exercise, you will read in a dataset and work on that ( in a dataframe ) Then, you will explore the distribution within the dataset An interactive tutorial on basic to advanced R features is also available within R itself It is called Statistics With Interactive R Learning ( SWIRL ) More information about installing and using this feature can be found at http swirlstats com students html Review the Readings Resources in the unit Complete the following steps Create a function ( named readStates ) to read a CSV file into R You need to read a URL, not a local file to your computer The file is a dataset on state populations ( within the United States ) The URL is https www 2 census gov programs surveys popest tables 2 0 1 0 2 0 1 1 state totals nst est 2 0 1 1 0 1 csv ( Note that you might need to use https rather than http ) Clean the dataframe Note the issues that need to be fixed ( removing columns, removing rows, changing column names ) Within your function, make sure there are 5 1 rows ( one per state the district of Columbia ) Make sure there are only 5 columns with the columns having the following names ( stateName , Census, Estimates, Pop 2 0 1 0 , Pop 2 0 1 1 ) Make sure the last four columns are numbers ( i e not strings ) Store and explore the dataset Store the dataset into a dataframe, called dfStates Test your dataframe by calculating the mean for the 2 0 1 1 data, by doing mean ( dfStates$Pop 2 0 1 1 ) Find the state with the highest population Based on the 2 0 1 1 data, what is the population of the state with the highest population What is the name of that state Sort the data, in increasing order, based on the 2 0 1 1 data Explore the distribution of the states Write a function that takes two parameters The first is a vector and the second is a number The function will return the percentage of elements within the vector that is less than the same ( i e cumulative distribution below the value provided ) For example, if the vector had 5 elements ( 1 , 2 , 3 , 4 , 5 ) , with 2 being the number passed into the function, the function would return 0 2 ( since 2 0 of the numbers were below 2 ) Test the function with the vector dfStates$Pop 2 0 1 1 , and the mean of dfStates$Pop 2 0 1 1

Question

Often when you get a dataset, it is not in the exact format you want   need   So , you have to refine the dataset into something more useful this is often called data munging  In this lab exercise, you will read in a dataset and work on that ( in a dataframe )   Then, you will explore the distribution within the dataset  An interactive tutorial on basic to advanced R features is also available within R itself  It is called Statistics With Interactive R Learning ( SWIRL )   More information about installing and using this feature can be found at http      swirlstats   com   students   html Review the Readings   Resources in the unit  Complete the following steps  Create a function ( named readStates ) to read a CSV file into R You need to read a URL, not a local file to your computer  The file is a dataset on state populations ( within the United States ) The URL is https      www 2   census gov   programs   surveys   popest   tables   2 0 1 0   2 0 1 1   state   totals   nst   est 2 0 1 1   0 1   csv ( Note that you might need to use https      rather than http      ) Clean the dataframe Note the issues that need to be fixed ( removing columns, removing rows, changing column names )   Within your function, make sure there are 5 1 rows ( one per state   the district of Columbia )   Make sure there are only 5 columns with the columns having the following names ( stateName , Census, Estimates, Pop 2 0 1 0 , Pop 2 0 1 1 )   Make sure the last four columns are numbers ( i   e   not strings )   Store and explore the dataset Store the dataset into a dataframe, called dfStates  Test your dataframe by calculating the mean for the 2 0 1 1 data, by doing  mean ( dfStates$Pop 2 0 1 1 )   Find the state with the highest population Based on the 2 0 1 1 data, what is the population of the state with the highest population  What is the name of that state  Sort the data, in increasing order, based on the 2 0 1 1 data  Explore the distribution of the states Write a function that takes two parameters  The first is a vector and the second is a number  The function will return the percentage of elements within the vector that is less than the same ( i   e   cumulative distribution below the value provided )   For example, if the vector had 5 elements ( 1 , 2 , 3 , 4 , 5 ) , with 2 being the number passed into the function, the function would return 0   2 ( since 2 0   of the numbers were below 2 )   Test the function with the vector dfStates$Pop 2 0 1 1 , and the mean of dfStates$Pop 2 0 1 1

Accepted Answer

The Answer is in the image, click to view ...

Question

Often when you get a dataset, it is not in the exact format you want / need . So , you have to refine the

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Intermediate Business Analytics A Practical Approach To Descriptive Prescriptive And Predictive Analytics With R

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question