Question
Stat 604 Assignment 04R Scope: This assignment reinforces the concepts covered prior to the graphics material in Lecture R06. NOTE:You may need to combine several
Stat 604
Assignment 04R
Scope: This assignment reinforces the concepts covered prior to the graphics material in Lecture R06.
NOTE:You may need to combine several functions together in a single command to accomplish some of the objectives of this assignment.
Specific Instructions for this Assignment:
Perform in R each of the exercises listed below.
Include a comment line in your script above the section for each step so that each is clearly identified.After you have finished debugging your code, you must save your script and restart R before running the script for the final submission.
Because of the modifications to the data frames made by the script, you cannot run the script multiple times in the same R session without causing problems for your code.
There is a file named TX2020-02.RData included with this assignment on Canvas. This is an R workspace containing daily summaries of climate data for the state of Texas for the month of February 2020.
There is also a file named 2021Feb_Precip.txt that contains essentially the same data for 2021. These data files were obtained from the website https://www.ncdc.noaa.gov/cdo-web/datasets.
Download these two files from Canvas and move them to an easily accessible folder on your computer where you will be storing your homework data files.
These are the climate data columns:
PRCP = Precipitation (inches)
SNOW = Snowfall (inches)
SNWD = Snow depth (inches)
TAVG = Average hourly temperature (Fahrenheit)
TMAX = Maximum temperature (Fahrenheit)
TMIN = Minimum temperature (Fahrenheit)
TOBS = Temperature at the time of observation (Fahrenheit)
As you go through the assignment, you will need to be alert for the answers to the following questions which you will paste into a comment section at the bottom of your script:
A. In what ways are the structure and formatting of the data from the two downloaded files unexpectedly different for data that is supposed to be "essentially" the same?
B. How do the number of rows compare when the two data frames are merged with all rows versus only matching rows?
C. How do the two results of the search functions compare in step 5?
D. How does TMAX21 compare with TMAX20 in step 6?
E. What is the smallest snowfall value reported in the "Top 100"list?
1. Prepare the header and perform housekeeping steps as described in HW 2.
2. Load into R the TX2020-02 workspace that you downloaded from Canvas.You may use the R menu to load the workspace initially, but your script must contain a line of code that will load the workspace the next time you run the script.
Some versions of R will make an entry in the console log showing the command that loaded the workspace.If you get this line, you may copy it into your script.Otherwise, you will need to find the command syntax in the Lecture Notes and write the command yourself.
a. Show the contents of your workspace after loading the additional workspace.
b. Display the structure of the object loaded from the workspace.
c. Display a summary of the object loaded from the workspace.
d. The last 7 columns of the object (PRCP through TOBS) will be referred to as the climate data columns.Later, we will be combining the data from two different years into a single data frame.Append 20 to the names of each of the climate data columns to designate that they are from 2020.For example, PRCP will be renamed as PRCP20.
Because these are column names, there can be no separator between the original name and the number.A programmer's trick for designating "nothing" is two quotes together without anything inside the quotes.(Remember you are not actually changing anything unless you use an assignment statement.)You should be able to do this as a single expression with nested commands that takes care of all values at once.
e. Create a new DAY column that contains the last two characters of the DATE value.
f. Display the first 10 rows and all columns of the modified object.3.The cardinal rule for data analysis is "Know thy data."Open the 2021Feb_Precip.txt file with text editing software and look carefully at its contents so you can get an idea of what information is available and how clean it is, etc. before analyzing it with R.
For Windows users, Notepad++ is a more robust editor and will show more information about the data than the Notepad program that is delivered with Windows.
You can download Notepad++ for free from cnet.com. If you are using a Mac you can get Text Wrangler from the App Store for similar functionality.
DO NOT save the file when you exit or you risk altering the file so that it does not produce the results you want in R.
Import the 2021Feb_Precip.txt file into an R data frame using the appropriate function.
The creators of the file used the number -9999 to designate missing data.
These numbers will prevent accurate analysis unless they are dealt with.
Add an option to the function that will convert -9999 to a missing R value.
a. Show the structure of the new data frame.
b. Display a summary of the new data frame.
c. Remove the prefix from the STATION values so they are consistent with the data loaded from the workspace in the previous step.One way to remove something is to replace it with "nothing".
d. Add 21 to the end of the climate data column names like you did in the previous step.
e. Create a new DAY column that contains the last two characters of the DATE value.
f. Display the first 15rows and all columns of the new data frame.
4. Create two new data frames (more details below)by merging the two data frames that you have been working with in the previous steps.When I merge tables, I usually list the oldest table first.When you reference the data frame from 2020 in your expression to combine the data frames, do so in such a way that the NAME, LATITUDE, LONGITUDE, and DATE columns are not a part of the new data frame.
Similarly, do not bring in STATE, COUNTRY, or DATE from the 2021 data frame.At least one of your references must utilize negative subscripts.
a. The first combined data frame must contain all rows from both years, matching and non-matching.This will be referred to later in the instructions as the All Combined Data Frame.
b. The second data frame will contain only those rows with a match in both tables.This will be referred to as the Matches Data Frame.
c. Display the structure of both new data frames and compare the number of rows in each. (If either has over 90,000 rows, there is a problem.)
5. Perform the steps below using the Matches Data Frame:
a. Execute the search function to show which packages, etc. are loaded into the R environment.Execute a function that will make the columns of the data frame available to R directly by column name. Execute the search function again and compare the output with the previous instance.
b. Create a new data frame column named LowTempDiff that contains the value of TMIN20 minus TMIN21 (Minimum temperatures for each day).
c. Display the summary statistics for the new column.
d. Display the structure of the updated data frame and its first 10 rows.
e. Execute a function so that the column names of the data frame are no longer available in the R search path.
6. In a single expression, use the apply function on the Matches Data Frame to show the maximum values from each of the climate data columns from both years.(This should produce 14 values).
7. Create a new data frame that is a subset of the All Combined Data Frame created above.Use a logical test to subset the rows to only those where the value of SNOW21 is greater than 0 and not missing.Display the structure of the new data frame.
8. Using the last data frame created, display a "top 100"list of Station names, Days and SNOW21 values, in descending order, starting with the highest SNOW21value.
9. Using the same data frame as the previous step, display all data for rows where the Station name begins with the word COLLEGE.(We are looking for data from College Station).
10. Display the contents of the workspace.
11. Remove from the workspace the first object that was loaded from TX2020-02.RData.
12. Save your workspace in case you need to use it in the next assignment.Name it HW04.RData.You may save it initially using the R GUI but your script must contain code to save the workspace when you submit the script again.
13. After you have run your debugged script for submission, place the answers to the questions A through E in comment lines at the bottom of your script.
14. Convert your script and console to PDF and submit them to Canvas.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started