Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need help with this on R studio! A screen shot of the steps would be very helpful! One of the basic data-building blocks in R

Need help with this on R studio! A screen shot of the steps would be very helpful!

One of the basic data-building blocks in R is the dataframe, and this object bears a very strong resemblance to the concept of a table in SQL. In fact, there is a package in R called sqldf that allows for the manipulation of a dataframe as an SQL table. This feat is accomplished thanks to SQLite (www.sqlite.org (Links to an external site.)Links to an external site.), a fantastic, lightweight, open source implementation of SQL. Working with sqldf and SQLite is so convenient that under normal circumstances you can do your work completely within R, with no software installations needed on your computer (other than running the install.packages() command in R).

The overall goal of this activity is to use SQL to produce a subset of the built-in airquality R dataset that contains only those records where the concentration of ozone is higher than the mean level of ozone. These are the conceptual steps you will need to follow:

Review online documentation for sqldf so that you are familiar with the basic concepts and usage of the package and its commands.

Install and activate (library()) the sqldf package in RStudio. With any new package it is possible to run into installation issues depending on your platform and the versions of software you are running, so monitor your diagnostic messages carefully.

Make sure the built-in airquality dataset is available for use in subsequent commands. It would be wise to reveal the first few records of air quality with head() to make sure that air quality is available. This will also show you the names of the columns of the air quality dataframe which you will need to use in SQL commands.

Using sqldf(), run an SQL select command that calculates the average level of ozone across all records. Assign the resulting value into a variable and print it out in the console.

Again using sqldf(), run another SQL command that selects all of the records from air quality where the value of ozone is higher than the average. Note that it is possible to combine steps 4 and 5 into a single SQL command those who are familiar with SQL syntax and usage should attempt to do so.

Refine step 5 to write the result table into a new R data object called newAQ. Then run a command to reveal what type of object newAQ is, another command to show what its dimensions are (i.e., how many rows and columns), and a head() command to show the first few rows.

Repeat steps 4, 5 and 6 using tapply, which is a more R like way to do the analysis.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Authors: Michael L. Brodie

1st Edition

1947487167, 978-1947487161

More Books

Students also viewed these Databases questions

Question

3. What are potential solutions?

Answered: 1 week ago