Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will be working with data on Airbnb listings in Edinburgh, Scotland. The data has the following variables: id - the listing's ID number price

You will be working with data on Airbnb listings in Edinburgh, Scotland. The data has the following variables:

id - the listing's ID number

price - the price, in GBP, for one night stay

neighbourhood - neighborhood the listing is located in

accommodates - number of people the listing accommodates

bathrooms - the listing's number of bathrooms

bedrooms - the listing's number of bedrooms

beds - the listing's number of beds (which can be different than the number of bedrooms)

review_scores_rating - the average customer rating of the property (ranges from 0 to 100)

number_of_reviews - the number of reviews included in the rating

listing_url - the URL for the listing

Part 1: Load the data and packages

Part 2: Explore the data

Explore the data using 5 functions: dim(), str(), colnames(), head() and tail(). Change the data types as appropriate.

Part 3: Handling missing values

Identify missing values

Check the data for missing values. Hint: Pipe the data frame into the function summarise_all(~sum(is.na(.))).

you should set missing values for neighbourhood to "Unknown" and delete all records with missing values for any of the other variables. Write code to accomplish this objective in the code block below.

Hint: Use ifelse to assign new values to neighbourhood and use na.omit to remove all rows with missing values for any variable.

Question 1

For this question, you will explore how the price of Airbnb listings, including how price varies across neighbourhood.

Part 1

Use dplyr to compute the mean, median, standard deviation, max, and min for price.

Part 2

Explore how price differs across neighbourhood by repeating Part 1 but grouping by neighbourhood

Part 3

Use ggplot2 to create data visualizations showing how price differs across neighbourhood. Hint: use coord_flip() to make the plot more readable.

Part 4

Comment on your findings from Part 2 and 3: what did you learn about how price differs across neighbourhood?

Question 2

For this question, you will explore how the price of Airbnb listings relates to how many people the listing accommodates.

Part 1

Create a histogram for accommodates to understand the variation in the data. Be sure to experiment with the number of bins

Part 2

Comment on your findings from Part 1 - what did you learn about the distribution of the number of people a listing can accommodate?

Part 3

Compute the correlation between price and accommodates.

Part 4

Create a visualization showing the relation between price and accommodates.

Part 5

Comment on your findings from Part 3 and 4: what did you learn about how a listing's price changes with the number of people the listing accommodates?

Question 3

For this question, you will explore how the number of people a listing accommodates is related to the listing's number of beds and its price.

Part 1

Create a visualization showing the relation between accommodates and beds. Add a reference line to the graphic using geom_abline().

Part 2

Comment on your findings from Part 1 - are there always enough beds for the number of people the listing claims it accommodates

Part 3

Create a new variable called short_on_beds that equals "Not enough beds" when accommodates > beds and "Enough beds" otherwise.

Part 4

Create a visualization to show how the relation between price and accommodates changes with the variable short_on_beds. Be sure to include a trend line.

Part 5

Comment on your findings from Part 4 - how does the relationship between accommodates and price change when there are not enough beds to sleep all guests (i.e., short_on_beds equal "Not enough beds")?

Question 4

For this question, you will explore the relation between a listing's price per night and its average rating (i.e., review_scores_rating), including how the relation changes according to the size of the listing (you will create a size variable using the accommodates variable).

Part 1

Compute the correlation between price and review_scores_rating.

Part 2

Graphic showing the relationship between price and review_scores_rating.

Part 3

Comment on your findings from Part 1 and 2 - based on your preliminary analysis, is there a relation between price and review_scores_rating?

Part 4

You remembered that there is a variable number_of_reviews that gives the number of reviews included in the listing's average rating (i.e., review_scores_rating). You wonder if your preliminary analysis would change after removing listings with fewer than 20 rating, which might have unreliable ratings.

Compute the correlation between price and review_scores_rating after removing listings with fewer than 20 ratings.

Part 5

graphic showing the relation between price and review_scores_rating after removing listings with fewer than 20 ratings.

Part 6

Comment on your findings from Part 4 and 5.

Part 7

You are not ready to draw any final conclusions between the relation between price and review_scores_rating. You wonder if the size of the listing changes the relation.

First, add a variable big that equal "Big" if accommodates > 7 and "Small" otherwise. Then, add a graphic showing how the relationship between price and review_scores_rating changes according to big. Be sure to remove listings with fewer than 20 ratings.

Part 8

Comment on your findings from Part 7.

Part 9

You wonder if your findings form Part 7 are robust. Experiment with different ways of defining a "Big" listing based on accommodates.

Part 10

Is the pattern you identified in Part 6 and 7 robust?

Question 5

What is interesting in learning or exploring in this data set? For this question, it is required to right y own question and to try to answer it by writing R code.

Part 1 Write your question

Part 2

Write code to answer the question you asked in Part 1.

Part 3

Interpret your findings from Part 2 - what is the answer to your question from Part 1?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus Early Transcendentals

Authors: William L. Briggs, Lyle Cochran, Bernard Gillett

2nd edition

321954428, 321954424, 978-0321947345

More Books

Students also viewed these Mathematics questions

Question

What is meant by row-level access control?

Answered: 1 week ago

Question

What are some of the features of the Unified Process (UP)?

Answered: 1 week ago