Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Science Python 3.0 problem: Let X the leading digit of a randomly selected number from a large accounting ledger. For example, if we randomly

Data Science Python 3.0 problem:

image text in transcribed

Let X the leading digit of a randomly selected number from a large accounting ledger. For example, if we randomly draw the number $20,695, then X = 2. People who make up numbers to commit accounting fraud tend to give X a (discrete) uniform distribution, i.e, image text in transcribed . However, there is empirical evidence that suggests that naturally occurring numbers (e g, numbers in a non-fraudulent accounting ledgers) have leading digits that do not follow a uniform distribution. Instead, they follow a distribution defined by the following probability mass function image text in transcribed.

Part A: Write a function pmf natural that implements f(x). Your function should take in an integer x and retum image text in transcribed . Use your function to argue that f(x) is a well-defined probability mass function

Python 3.0 :

def pmf_natural(x): return 1.0

Part B: Use the function you wrote above to make stacked bar plots describing the pmf of the natally occurring numbers as well as the discrete uniform distribution. Make sure that the x- and y- limits on your plots are the same so that the two distributions are easy to compare.

Part C: Write a function cdf_natural that implements the cumulative distribution function image text in transcribed for image text in transcribed and use it to compute the probability that the leading digit in a number is at most 4 and at most 5.

Python 3.0:

def cdf_natural(y) :

return 1.0

Part D: The data in tax_data.txt contains the taxable income for individuals in 1978. Use Pandas and the information from Parts A-D to determine whether or not the dataset is likely fraudulent. In addition to code and any graphical summaries make sure to clearly justify your conclusion in words.

tax_data.txt can be found here:

https://raw.githubusercontent.com/dblarremore/csci3022/master/homework/homework3/tax_data.txt

Let X the leading digit of a randomly selected number from a large accounting ledger. For example, if we randomly draw the number 520,695, then X 2. People who maka up numbers to commit accounting fraud tend to give X a (discrete) uniform distribution, i.e, PXxorxe ..,9). However, there is empirical evidence that suggests that naturally occurring numbers (e g, numbers in a non-fraudulent accounting ledgers) have leading digits that do not follow a uniform distribution. Instead, they follow a distribution defined by the following probability mass function f(x) = log10 (-x for x = 1,2 ,9 Part A: Write a function pmf natural that implements f(x). Your function should take in an integer x and retum f(x) Px- x). Use your function to argue that f(x) is a well-defined probability mass function det pat natural(x) return 1.0 Part B. Use the function you wrote above to make stacked bar plots describing the pmf of the natally occurring numbers as well as the discrete uniform distribution. Make sure that thexand y-limits on your plots are the same so that the two distributions are easy to compare Part C. Write a function fatural that implements the cumulative distribution function F(y) for X and use it to compute the probability that the leading digit in a number is at most 4 and at most 5 def odr natural v return 1.0 Part D. The data in tax_data.txt contains the taxable income for individuals in 1978. Use Pandas and the information from Parts A-D to determine whether or not the dataset is likely fraudulent. In addition to code and any graphical summaries make sure to clearly justify your conclusion in words. Let X the leading digit of a randomly selected number from a large accounting ledger. For example, if we randomly draw the number 520,695, then X 2. People who maka up numbers to commit accounting fraud tend to give X a (discrete) uniform distribution, i.e, PXxorxe ..,9). However, there is empirical evidence that suggests that naturally occurring numbers (e g, numbers in a non-fraudulent accounting ledgers) have leading digits that do not follow a uniform distribution. Instead, they follow a distribution defined by the following probability mass function f(x) = log10 (-x for x = 1,2 ,9 Part A: Write a function pmf natural that implements f(x). Your function should take in an integer x and retum f(x) Px- x). Use your function to argue that f(x) is a well-defined probability mass function det pat natural(x) return 1.0 Part B. Use the function you wrote above to make stacked bar plots describing the pmf of the natally occurring numbers as well as the discrete uniform distribution. Make sure that thexand y-limits on your plots are the same so that the two distributions are easy to compare Part C. Write a function fatural that implements the cumulative distribution function F(y) for X and use it to compute the probability that the leading digit in a number is at most 4 and at most 5 def odr natural v return 1.0 Part D. The data in tax_data.txt contains the taxable income for individuals in 1978. Use Pandas and the information from Parts A-D to determine whether or not the dataset is likely fraudulent. In addition to code and any graphical summaries make sure to clearly justify your conclusion in words

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing

Authors: David M. Kroenke, David Auer

11th Edition

B003Y7CIBU, 978-0132302678

More Books

Students also viewed these Databases questions

Question

b. Explain how you initially felt about the communication.

Answered: 1 week ago