Question
Introduction Wine ratings and descriptions from critics are used by wine shoppers to aid in selecting a wine. These ratings and descriptions can be posted
Introduction Wine ratings and descriptions from critics are used by wine shoppers to aid in selecting a wine. These ratings and descriptions can be posted in the description of a wine featured on the web or by hanging a shelf tag with the bottles available on the shelf in brick and mortar stores. This information is provided to increase sales in the fiercely competitive roughly $435B wine industry.
Data Description The file Wine.csv contains the data. The following is a description of the variables included in the dataset. You will use this data dictionary to choose the correct column (variable) when answering questions for your homework assignment.
country - The country that the wine is from
description description of wine
points - Wine Enthusiast rating 1-100 (but only ratings 80 or higher are reported)
price price for a bottle of the wine
province - province or state the wine is from
region - The wine growing area in a province or state
sub region - a more specific region specified within a wine growing area
taster name
taster twitter handle
title review title
variety - The type of grapes used to make the wine (ie Pinot Noir)
winery - The winery that made the wine
designation - vineyard within the winery for grapes that made the wine
Task You work for a large chain wine retailer and your boss has tasked you with looking at the wine data for certain insights.
# Q0 Use the read.csv() function to read in the Wine.csv file. Don't forget to use the strings=T argument.
# Q1a How many observations (rows) are available to you? Use the nrow function.
# Q1b How many variables (columns) are collected on each rated wine? Use the ncol function.
# It is important to find the exact spellings of column names and entries within columns in the wine dataframe.
# Q2a Use the names() or str() function on the wine dataframe to view the exact spellings of the column names.
# Q2b Use the summary() function on the taster twitter handle column to view the exact spellings of the tasters' twitter handles.
# You will use this strategy of "looking" in a column to get exact sepllings to answer questions on this quiz.
# It is important to know whether there are missing values (NAs) in your data since NAs need to be addressed for certain functions to work correctly.
# Q3a Use the sum() and is.na() functions to find the number of missing values in the wine dataframe.
# Q3b Use the sum() and is.na() functions to count the number of missing values in the points column.
# Q3c Use the sum() and is.na() functions to count the number of missing values in the price column.
# Q3d Use the sum() and is.na() functions to count the number of observations that have a price. Hint: Since the ! means "not", putting an ! in front of is.na() counts the non-missing values.
# Q4a Use the mean function to calculate the average points for the wines in the dataset.
# Q4b Use the mean function to calculate the average price for the wines in the dataset.
# Q5a Use the mean function to calculate the average points for the Napa Sonoma sub-region. Your code should be a single line and will use the square brackets for subsetting..
# Q5b Use the mean function to calculate the average price for the Napa Sonoma sub-region, again in a single line of code.
# Q6 Now we'll practice our counting skills again.
# The rating scale being used by these tasters orignally had 91 as the highest possible score and now goes to 100 for exceedingly delicious wines.
# Use the sum() function on a logical vector that checks for whether points are greater than 91 or not to count the number of wine ratings with more than 91 points.
# Q7a Now we'll practice subsetting on more than one criteria.
# Use the mean function to compute the average points for wines that are chardonnay wines and from the central coast sub-region.
# Q7b More practice subsetting on more than one criteria.
# Use the mean function to compute the average price for wines that are either chardonnay wines or are from the finger lakes sub-region.
# Q8 Use the median function to find the median price for a Cabernet Sauvignon that is rated higher than 91 points.
PLEASE ANSWER ALL QUESTIONS
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started