Question
AusWeather.csv: This dataset describes the Australia weather from 2014 to 2017. Variable Description Date The date of observation. Location Names of cities in Australia. (The
AusWeather.csv: This dataset describes the Australia weather from 2014 to 2017.
Variable Description
- Date The date of observation.
- Location Names of cities in Australia. (The longest name has 16 characters.)
- MinTemp The minimum temperature in degrees Celsius.
- MaxTemp The maximum temperature in degrees Celsius.
- Rainfall The amount of rainfall recorded for the day in mm.
- WindGustDir The direction of the strongest wind gust in the 24 hours to midnight.
- WindGustSpeed The speed (km/h) of the strongest wind gust in the 24 hours to midnight.
- Humidity Humidity (percent) at 3pm.
- Pressure Atmospheric pressure (hpa) reduced to mean sea level at 3pm.
- RainToday 1 if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise 0.
1. Start a new folder called "Project1", download the data file into this folder and use a LIBNAME to set the folder as a library named "Project1". (2pts)
2. Read the data file without INFORMATS but with the correct OPTIONS on the INFILE statement (2 pts), then answer the following questions: a) Use the Explorer pane to preview the dataset. Which variables were not read in correctly? (2 pts) (Hint: It may help to open the raw data set and compare.) b) Use INFORMATS to correct the misspecified variables. (4 pts) Note: You'll need to use INFORMATS for three variables (not including $). c) In a comment: What is the purpose of an informat? (1 pt)
3. Use the following code to sort the AusWeather data and save it in a new dataset. (1 pt). PROC SORT DATA=AusWeather OUT=sortedAusWeather; BY RainToday; RUN; a) Which library is the new dataset stored in? (1 pt) (Hint: You may want to look at the OUT = option.) b) What is the difference between a temporary and permanent library? (1 pt)
4. Use one PROC step to answer the following questions: c) What are the Humidity means for rainy days and days without rain? Only include output for the humidity variable. (2 pts) d) Report in a comment the 99% confidence intervals for the mean Humidity for these two categories. (2 pts) (Hint: the default confidence level is 95%. You'll need an option to change that.)
5. Use the unsorted dataset AusWeather and create two contingency tables that allow you to answer the following questions (one table for each): a) Which city has the highest rain probability? (3 pts) b) Which wind direction occurred most frequently? (3 pts) c) What is the purpose of creating contingency tables? (1 pt)
6. Use only one PROC step to plot a vertical boxplot of Pressure for different groups of RainToday. (Tips: Need an option here). (2 pts) And answer the following question: a) Based on the boxplot, which group has higher pressure on average? What did you use from your plot to determine this? (2 pts)
7. Make a correlation matrix of MinTemp, MaxTemp, Rainfall, WindGustSpeed, Humidity and Pressure. Which variable has the strongest correlation with Rainfall (other than Rainfall itself) ? (3 pts)
Can you provide the SAS code for this?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started