Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this assignment, we look into a restaurant s tipping dataset, that consists of the following fields: Obs: id# of a restaurant s bills incidents

For this assignment, we look into a restaurants tipping dataset, that consists of the following fields:
Obs: id# of a restaurants bills incidents
totbill: total amount of the bill
tip: tip amount
sex: M/F
smoker: Yes/No
day: Mon thru Sun
time: Day/Night
size: party number
1. To load the data using read_csv for the data files attached (tips.txt or tips.csv), save the files and
path of your own, to include the directory path of the loaded files, like below example of the
saved path:
In: tips= pd.read_csv('c:/Users/mowaf/Data/tips.txt', sep=',')
In: tips= pd.read_csv('c:/Users/mowaf/Data/tips.csv')
2. To view the file dataset (first 5 rows):
In: tips.head()
obs totbill tip sex smoker day time size
0116.991.01 F No Sun Night 2
1210.341.66 M No Sun Night 3
2321.013.50 M No Sun Night 3
3423.683.31 M No Sun Night 2
4524.593.61 F No Sun Night 4
3. To make a stacked bar plot showing the percentage of data points for each party size on each
day, make a cross-tabulation by day and party size and sh:
In[]: party_count=pd.crosstab(tips['day'],tips['size'])
In []: party_count
Out:
size 123456
day
Fri 1161100
Sat 253181310
Sun 039151831
Thu 1484513
4. To normalize the data, so that each row sums to 1 and make the plot:
In: party_counts = party_count.loc[:,2:5]
In: party_counts
size 2345
day
Fri 16110
Sat 5318131
Sun 3915183
Thu 48451
5. Normalize so that each row sums to 1 to make a bar plot of the party size (counts) over the
weekdays:
# Normalize to sum to 1
In: party_pcts = party_counts.div(party_counts.sum(1), axis=0)
In: party_pcts
size 2345
day
Fri 0.8888890.0555560.0555560.000000
Sat 0.6235290.2117650.1529410.011765
Sun 0.5200000.2000000.2400000.040000
Thu 0.8275860.0689660.0862070.017241
6. Draw plot(kind='bar') of the party_pcts from step 5.
Show your plot output.
7. To find the percentage of the tip for each bill, create a column tip_pct:
In: tips['tip_pct']= tips['tip']/ tips['totbill']
tips.head(5)
Similarly, to list the first five rows of the dataset:
tips[:6]
Show your output.
8. Draw a histogram plot (frequency of the data points split into discrete, evenly spaced bins, with
the number of data points in each bin), using the tip percentages of the total bill, tips[tip_pct],
of step 7:
tips['tip_pct'].plot.hist(bins=50)
Show your output.
9. To show a summary statistics of the percentag of tips tip_pct for day, smoker, sex, and time
columns, we can call the describe on a groupby object:
tips.groupby('smoker')['tip_pct'].describe()
Show your output.
Similarly, show the tip_pct aggregated on sex,day and time
Show your output.
10. Aggregating a Series or all of the columns of a DataFrame is a matter of using aggregate with the
desired function or calling a method like mean or std. However, we may want to aggregate using
a different function depending on the column, or multiple functions at once.
To group the tips by day and smoker:
grouped = tips.groupby(['day', 'smoker'])
grouped
Out[130]:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

5. Explain how ERISA protects employees pension rights.

Answered: 1 week ago