Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

EXECUTIVE SUMMARY The dataset I choose is the New York City Airbnb Listings, which can be found on Kaggle. This dataset is why I selected

  1. EXECUTIVE SUMMARY

The dataset I choose is the New York City Airbnb Listings, which can be found on Kaggle. This dataset is why I selected it: it offers a thorough picture of Airbnb activity in New York City, one of the world's liveliest cities. The price of Airbnb listings may be predicted, the factors influencing the price can be understood, the geographical distribution of Airbnb listings can be examined, and many other problems can be solved with the help of this dataset. I selected this dataset because to its ability to provide an extensive overview of the Airbnb industry in New York City, one of the most visited cities in the world. A thorough examination of this information may yield important insights into the variables influencing the cost and demand for Airbnb listings, as well as the influence of geography and home type on user preferences.

  1. INTRODUCTION

The dataset is derived from the official Airbnb website and is updated on a daily basis by Inside Airbnb, an independent effort that attempts to illustrate how Airbnb affects residential neighborhoods.

3. DATA SELECTION APPROACH

: DATASET INFO The dataset has more than 100 rows and 5 fields. The fields include:

  1. id: The listing ID (Numeric)
  2. name: The name of the listing (Categorical)
  3. host_id: The host ID (Numeric)
  4. host_name: The name of the host (Categorical)
  5. neighbourhood_group: The group of the neighborhood (Categorical)
  6. neighbourhood: The neighborhood (Categorical)
  7. latitude: The latitude of the listing (Numeric)
  8. longitude: The longitude of the listing (Numeric)
  9. room_type: The type of room (Categorical)
  10. price: The price of the listing (Numeric)
  11. minimum_nights: The minimum number of nights (Numeric)
  12. number_of_reviews: The number of reviews (Numeric)
  13. last_review: The date of the last review (Date)
  14. reviews_per_month: The number of reviews per month (Numeric)
  15. calculated_host_listings_count: The number of listings per host (Numeric)
  16. availability_365: The number of days when the listing is available for booking (Numeric)

4. METHODS USED

To print the first few rows in Python, you would normally import the dataset using the pandas module and then use the head() function.

import pandasas pd

# Load the dataset

df= pd.read_csv('AB_NYC_2019.csv')

# Print the first few rows

print(df.head())

HELP ME ANSWER THESE QUESTIONS BELOW

5.6 Data Reduction

The purpose of this section is to outline if any invalid data (Null values) were found in the dataset and if those affected records (rows) were removed from the dataset. This section also identifies if any extraneous features (columns) were removed as a results of identified duplication.

.

5.7 Data Visualization and Meaning

Discuss in this section the distribution of your dependent and independent variables. What graphs (plots) did you use to identify any outliers in your dataset. Include any graphs used (distribution or heatmap graphs).

5.8 Sample Size (Training and Test Data)

The purpose of this section is to identify the chosen split between the test and training data during your model building. What is the rationale of your percentage split of the dataset?

This section would include a copy the code used to train the model which shows the data split between training and test subsets.

5.9 Analysis Bias

The purpose of this section is to identify any Bias in the dataset. Outline what different kinds of Bias can be introduced in a dataset.

6. CONCLUSION

  1. The purpose of this section is to outline conclusions made out of the analysis of the chosen dataset. Does the independent variable impact the dependent variables? If so provide some examples of the data.

6.1 Model Summary

The purpose of this section is to identify the model approach used and any other conclusions pertinent to the analysis.

6.1.1 Actual vs. Predicted Values

The purpose of this section is to outline/list the actual vs. predicted values of your model. Are the differences between the actual and predicted values within acceptable range(s), or too far apart? What does the difference tell us?

6.2 Coefficient Interpretation

The purpose of this section is to list the analysis coefficient and what it means in relation to the dataset variables (dependent and independent variables).

6.3 Evaluation Metrics

The purpose of this section is to list the different evaluation metrics that can be used in the analysis. What does each metric mean?

6.3.1 Mean Absolute Error Result

The purpose of this section is to list the Mean Absolute Error of your analysis.

6.3.2 Mean Square Error Result

The purpose of this section is to list the Mean Square Error of your analysis.

6.3.2 Root Mean Square Error Result

The purpose of this section is to list the Root Mean Square Error of your analysis.

6.3.2 R-Square Score Result

The purpose of this section is to list the R-Square Score of your analysis.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Transportation A Global Supply Chain Perspective

Authors: John J. Coyle, Robert A. Novak, Brian Gibson, Edward J. Bard

8th edition

9781305445352, 1133592961, 130544535X, 978-1133592969

More Books

Students also viewed these Mathematics questions

Question

4-1. What is meant by the term you attitude? [LO-1]

Answered: 1 week ago