Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

https://catalog.data.gov/dataset/crime-data-from-2020-to-present 1. Find a dataset and describe your motivation (10pt) a. Find a dataset online and provide a URL to the data (5pt) i. This

https://catalog.data.gov/dataset/crime-data-from-2020-to-present

image text in transcribedimage text in transcribed
1. Find a dataset and describe your motivation (10pt) a. Find a dataset online and provide a URL to the data (5pt) i. This dataset should have at least 4 variables with at least 1 numeric variable and 1 character variable ii. Some places to find datasets: 1. The UCI machine learning repository: https://archive.ics.uci.edu/ 2. Kaggle: https://www.kaggle.com 3. data.gov 4. data.europa.eu b. Explain why you chose this data (what interest it holds for you personally or professionally) (5pt) 2. Import the data into SAS OnDemand (20pt) a. It is recommended that you not try to read the data in from the URL and instead download the file to your computer and then upload it to your SAS OnDemand workspace. b. Be sure you know what type of delimiting your data uses to read it in correctly c. Use the REPLACE option in your PROC IMPORT so that old versions will be overwritten if your first PROC IMPORT does not work as intended d. Save this dataset to a permanent library 3. Do the following tasks (70pt): a. Create a new variable in your dataset using the existing variables (10pt)i. Forexample, you may add two columns together, add a Yes/No version of one column, or anything else! Give labels to variables which you are interested in (does not have to be all variables!) (10pt) Sort your data by a variable of your choosing and output the sorted datato afile in your working directory (10pt) Generate a contingency table for at least one of your character variables displaying also the cumulative column percentages (look at SAS documentation if you are unsure how to do this), comment on the results (15pt) Get the mean and standard deviation or the median and the interquartile range for at least one of your numeric variables across different values of one of your character variables and comment on the results (15pt) Create at least one (1) appropriate plot and explain what information it provides (10pt) 4. BONUS POINTS (15pt) a. C. Subset your data to only the variables which you use in your script. If you use all of your variables, you may write a statement that will keep all of the variables in your dataset (5pt) Change the names of some of your variables to names which are more descriptive or easier to type (5pt) For part 3(d) or 3(e), do this with only a subset of your data (5pt) 5. Save your output as a .PDF file and submit it to Moodle alongside your .SAS file. You're all done

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Linear Algebra and Its Applications

Authors: Gilbert Strang

4th edition

30105678, 30105676, 978-0030105678

More Books

Students also viewed these Mathematics questions

Question

Determine the amplitude and period of each function.

Answered: 1 week ago