Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1 . To load the data using read _ csv for the data files attached ( tips . txt or tips.csv ) , save the
To load the data using readcsv for the data files attached tipstxt or tips.csv save the files and
path of your own, to include the directory path of the loaded files, like below example of the
saved path:
In: tips pdreadcsvc:UsersmowafDatatipstxt sep
In: tips pdreadcsvc:UsersmowafDatatipscsv
To view the file dataset first rows:
In: tips.head
obs totbill tip sex smoker day time size
F No Sun Night
M No Sun Night
M No Sun Night
M No Sun Night
F No Sun Night
To make a stacked bar plot showing the percentage of data points for each party size on each
day, make a crosstabulation by day and party size and sh:
In: partycountpdcrosstabtipsdaytipssize
In : partycount
Out:
size
day
Fri
Sat
Sun
Thu
To normalize the data, so that each row sums to and make the plot:
In: partycounts partycount.loc::
In: partycounts
size
day
Fri
Sat
Sun
Thu
Normalize so that each row sums to to make a bar plot of the party size counts over the
weekdays:
# Normalize to sum to
In: partypcts partycounts.divpartycounts.sum axis
In: partypcts
size
day
Fri
Sat
Sun
Thu
Draw plotkind'bar' of the partypcts from step
Show your plot output.
To find the percentage of the tip for each bill, create a column tippct:
In: tipstippct tipstip tipstotbill
tips.head
Similarly, to list the first five rows of the dataset:
tips:
Show your output.
Draw a histogram plot frequency of the data points split into discrete, evenly spaced bins, with
the number of data points in each bin using the tip percentages of the total bill, tipstippct
of step :
tipstippctplot.histbins
Show your output.
To show a summary statistics of the percentag of tips tippct for day, smoker, sex, and time
columns, we can call the describe on a groupby object:
tips.groupbysmokertippctdescribe
Show your output.
Similarly, show the tippct aggregated on sexday and time
Show your output.
Aggregating a Series or all of the columns of a DataFrame is a matter of using aggregate with the
desired function or calling a method like mean or std However, we may want to aggregate using
a different function depending on the column, or multiple functions at once.
To group the tips by day and smoker:
grouped tips.groupbyday 'smoker'
grouped
Out:
To check and view the output of the aggregated files:
grouped.describe
Show your output.
To calculate the average of tippct of the aggregated group groupedpct on day and
smoker
groupedpct groupedtippct
groupedpctaggmean
day smoker
Fri No
Yes
Sat No
Yes
Sun No
Yes
Thu No
Yes
Name: tippct dtype: float
Alternatively, we can pass a list of functions or function names instead, you get back a
DataFrame with column names taken from the functions:
groupedpctaggmeanstd
mean std
day smoker
Fri No
Yes
Sat No
Yes
Sun No
Yes
Thu No
Yes
Similarly, calculate the average of tippct aggregated on day and sex
Show your code and output.
With a DataFrame, we can also specify a list of functions to apply to all of the columns or
different functions per column. To compute the same three statistics for the tippct and totalbill
columns:
groupedtippct'totbill'aggcount'mean', 'max'
tippct totbill
count mean max count mean max
day smoker
Fri No
Yes
Sat No
Yes
Sun No
Yes
Thu No
Yes
Suppose we wanted to select the top five tippct values by group. First, to write a function that
selects the rows with the largest values in a particular column:
In: def topdf n column'tippct: return dfsortn:
toptips n
Show your output.
then, to group by smoker and call apply with the top function:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started