Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 30, 2024

1 . To load the data using read _ csv for the data files attached ( tips . txt or tips.csv ) , save the

1 .

To load the data using read

_

csv for the data files attached

(

tips

.

txt or tips.csv

),

save the files and

path of your own, to include the directory path of the loaded files, like below example of the

saved path:

In: tips

=

.

read

_

csv

('

/

Users

/

mowaf

/

Data

/

tips

.

txt

',

sep

=',')

In: tips

=

.

read

_

csv

('

/

Users

/

mowaf

/

Data

/

tips

.

csv

')

2 .

To view the file dataset

(

first

5

rows

)

In: tips.head

()

obs totbill tip sex smoker day time size

0 1 16.99 1.01

F No Sun Night

2

1 2 10.34 1.66

M No Sun Night

3

2 3 21.01 3.50

M No Sun Night

3

3 4 23.68 3.31

M No Sun Night

2

4 5 24.59 3.61

F No Sun Night

4

3 .

To make a stacked bar plot showing the percentage of data points for each party size on each

day, make a cross

-

tabulation by day and party size and sh:

[]

: party

_

count

=

.

crosstab

(

tips

['

day

'],

tips

['

size

'])

[]

: party

_

count

Out:

size

1 2 3 4 5 6

day

Fri

1 16 1 1 0 0

Sat

2 53 18 13 1 0

Sun

0 39 15 18 3 1

Thu

1 48 4 5 1 3

4 .

To normalize the data, so that each row sums to

1

and make the plot:

In: party

_

counts

=

party

_

count.loc

[

, 2

5]

In: party

_

counts

size

2 3 4 5

day

Fri

16 1 1 0

Sat

53 18 13 1

Sun

39 15 18 3

Thu

48 4 5 1

5 .

Normalize so that each row sums to

1

to make a bar plot of the party size

(

counts

)

over the

weekdays:

# Normalize to sum to

1

In: party

_

pcts

=

party

_

counts.div

(

party

_

counts.sum

(1),

axis

= 0)

In: party

_

pcts

size

2 3 4 5

day

Fri

0.888889 0.055556 0.055556 0.000000

Sat

0.623529 0.211765 0.152941 0.011765

Sun

0.520000 0.200000 0.240000 0.040000

Thu

0.827586 0.068966 0.086207 0.017241

6 .

Draw plot

(

kind

=

'bar'

)

of the party

_

pcts from step

5 .

Show your plot output.

7 .

To find the percentage of the tip for each bill, create a column

tip

_

pct

In: tips

['

tip

_

pct

'] =

tips

['

tip

'] /

tips

['

totbill

']

tips.head

(5)

Similarly, to list the first five rows of the dataset:

tips

[

6]

Show your output.

8 .

Draw a histogram plot

(

frequency of the data points split into discrete, evenly spaced bins, with

the number of data points in each bin

),

using the tip percentages of the total bill, tips

[

tip

_

pct

],

of step

7

tips

['

tip

_

pct

'] .

plot.hist

(

bins

= 50)

Show your output.

9 .

To show a summary statistics of the percentag of tips

tip

_

pct

for day, smoker, sex, and time

columns, we can call the describe on a groupby object:

tips.groupby

('

smoker

') ['

tip

_

pct

'] .

describe

()

Show your output.

Similarly, show the

tip

_

pct

aggregated on

sex

,

day

and

time

Show your output.

10 .

Aggregating a Series or all of the columns of a DataFrame is a matter of using aggregate with the

desired function or calling a method like mean or std

.

However, we may want to aggregate using

a different function depending on the column, or multiple functions at once.

To group the tips by day and smoker:

grouped

=

tips.groupby

(['

day

',

'smoker'

])

grouped

Out

[130]

To check and view the output of the aggregated files:

grouped.describe

()

Show your output.

11 .

To calculate the average of

tip

_

pct

of the aggregated group

grouped

_

pct

day

and

smoker

grouped

_

pct

=

grouped

['

tip

_

pct

']

grouped

_

pct

.

agg

('

mean

')

day smoker

Fri No

0.151650

Yes

0.174783

Sat No

0.158048

Yes

0.147906

Sun No

0.160113

Yes

0.187250

Thu No

0.160298

Yes

0.163863

Name: tip

_

pct

,

dtype: float

64

12 .

Alternatively, we can pass a list of functions or function names instead, you get back a

DataFrame with column names taken from the functions:

grouped

_

pct

.

agg

(['

mean

','

std

'])

mean std

day smoker

Fri No

0.151650 0.028123

Yes

0.174783 0.051293

Sat No

0.158048 0.039767

Yes

0.147906 0.061375

Sun No

0.160113 0.042347

Yes

0.187250 0.154134

Thu No

0.160298 0.038774

Yes

0.163863 0.039389

13 .

Similarly, calculate the average of

tip

_

pct

aggregated on

day

and

sex

Show your code and output.

14 .

With a DataFrame, we can also specify a list of functions to apply to all of the columns or

different functions per column. To compute the same three statistics for the tip

_

pct and totalbill

columns:

grouped

['

tip

_

pct

',

'totbill'

] .

agg

(['

count

',

'mean', 'max'

])

tip

_

pct totbill

count mean max count mean max

day smoker

Fri No

4 0.151650 0.187735 4 18.420000 22.75

Yes

15 0.174783 0.263480 15 16.813333 40.17

Sat No

45 0.158048 0.291990 45 19.661778 48.33

Yes

42 0.147906 0.325733 42 21.276667 50.81

Sun No

57 0.160113 0.252672 57 20.506667 48.17

Yes

19 0.187250 0.710345 19 24.120000 45.35

Thu No

45 0.160298 0.266312 45 17.113111 41.19

Yes

17 0.163863 0.241255 17 19.190588 43.11

15 .

Suppose we wanted to select the top five tip

_

pct values by group. First, to write a function that

selects the rows with the largest values in a particular column:

In: def top

(

,

= 5,

column

=

'tip

_

pct

')

: return df

.

sort

() [-

]

top

(

tips

,

= 5)

Show your output.

then, to group by smoker and call apply with the top

()

function:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

★★★★★

To what extent will it be possible for you and those with whom you work to be able to distinguish between your role as a researcher and your usual work role? What steps might you take to maintain...

Answered: 1 week ago

Previous Question Next Question