Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Benford's Law is an observation about the distribution of the frequencies of the first digits of the numbers in many different data sets. It is

Benford's Law is an observation about the distribution of the frequencies of the first digits of the numbers in many different data sets. It is frequently found that the first digits are not uniformly distributed, but follow the logarithmic distribution

P(d)=log10(d+1d).P ( d ) = log 10 ( d + 1 d ) .

That is, numbers starting with 1 are more common than those starting with 2, and so on, with those starting with 9 the least common. The probabilities are given below:

Digit Probability
1 0.301
2 0.176
3 0.125
4 0.097
5 0.079
6 0.067
7 0.058
8 0.051
9 0.046

Benford's Law is most accurate for data sets which span several orders of magnitude, and can be proved to be exact for some infinite sequences of numbers.

Write a Python program that analyzes a set of integers to see how closely it follows Benford's Law. If you need some help thinking about how to do this, here is an excellent article that shows you how to do this in Excel. Get the name of the input file from the command line. Name this program benford.py.

(a) Demonstrate that the first digits of the first 500 Fibonacci numbers (see this Example) follow Benford's Law quite closely. You will need to generate these numbers and put them in a file or look them up and copy/paste them into a file. MAKE SURE YOU HAVE THE CORRECT NUMBERS. Name the file fib500.txt.

(b) The length of the amino acid sequences of 500 randomly-chosen proteins are provided in the file protein_lengths.py. This file contains a list, naa, which can be imported at the start of your program with:

from protein_lengths import naa 

To what extent does the distribution of protein lengths obey Benford's Law?

Your program will determine which analysis to perform (Fibonacci or proteins) by reading the command line. For example: "python3 benford.py fib" will perform the Fibonacci analysis and "python3 benford.py protein" performs the protein analysis.

Your output for both programs should be formatted as follows. A heading line followed by 9 lines with each digit, 1-9, followed by the percentage in the format 0.###

Fibonacci Analysis (or Amino Analysis)

1 #.###

2 #.###

3 #.###

thru 9

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

protein_lengths.py

naa = [40055, 27711, 36332, 13309, 12808, 3425, 56868, 44400, 26557, 48423, 14844, 47584, 26641, 27510, 21004, 13527, 80355, 34317, 33004, 32609, 37652, 27852, 192897, 62837, 18936, 27551, 20574, 18708, 31629, 36769, 3394, 51287, 37785, 44999, 19053, 64524, 24274, 44360, 11626, 24942, 18248, 9086, 36636, 22415, 55285, 61596, 64841, 16850, 36145, 28193, 36236, 14643, 5064, 11241, 2890, 29797, 14445, 14332, 33444, 58127, 64262, 12374, 45264, 13555, 5026, 27469, 41704, 11652, 9654, 27854, 60503, 26307, 36306, 43643, 8511, 103416, 16914, 63771, 27967, 55336, 33749, 37857, 34322, 9149, 30535, 11454, 53351, 85349, 152093, 50425, 34097, 25820, 41280, 64104, 54594, 49515, 46720, 35481, 24628, 37274, 54113, 39747, 39373, 15803, 30657, 39552, 41612, 37797, 10416, 33914, 17017, 40889, 10288, 67021, 42538, 8436, 38042, 11930, 77894, 50805, 23632, 49670, 2614, 23954, 9860, 57545, 46826, 58679, 69640, 23940, 50590, 27693, 35705, 21997, 40607, 5414, 38285, 11261, 123691, 51860, 49265, 5125, 46084, 29732, 15568, 18376, 14352, 38818, 17392, 32460, 50364, 4421, 7161, 18344, 21850, 25299, 14129, 14530, 22150, 35136, 5170, 43089, 5748, 43761, 33491, 43027, 66908, 20856, 35835, 34878, 17926, 16568, 27354, 72621, 131102, 11916, 17526, 29761, 84144, 26944, 9300, 626, 23406, 62492, 4230, 4195, 5897, 10393, 64900, 25675, 29653, 20305, 6523, 26771, 13912, 38499, 44921, 9045, 44187, 1344, 7177, 32502, 37091, 88008, 22020, 6790, 15748, 19431, 56321, 5683, 20250, 20437, 29473, 44684, 25930, 30302, 14574, 136485, 35242, 49883, 23935, 1640, 33338, 1508, 70164, 36366, 70161, 36666, 36970, 39757, 82280, 37944, 40951, 27397, 69690, 35003, 66168, 87552, 126056, 57127, 40335, 63955, 125260, 14674, 21548, 14656, 21934, 7862, 24767, 10811, 32661, 46748, 25467, 58673, 11264, 60199, 18799, 6513, 2253, 36712, 22885, 17548, 29870, 37949, 15386, 62111, 51134, 1735, 18269, 82362, 34355, 13199, 16853, 39614, 46452, 13833, 53883, 37563, 36637, 22871, 39574, 30017, 1147, 13522, 44112, 36598, 12525, 10614, 15627, 44802, 47193, 53677, 61054, 12945, 42905, 2558, 331774, 6734, 21457, 51491, 37197, 13542, 16495, 109616, 31379, 17175, 76105, 27734, 25276, 14537, 38191, 22069, 41348, 34746, 35426, 39343, 37929, 47420, 28223, 27755, 27007, 30816, 15623, 98138, 42984, 46130, 12264, 21960, 40755, 98353, 12675, 100728, 57881, 27907, 25722, 18483, 17548, 53622, 71649, 13684, 33692, 96542, 35028, 19657, 12132, 16538, 101121, 19229, 25289, 22407, 72784, 19841, 22639, 81729, 53888, 82644, 49803, 23756, 17963, 42399, 8525, 56083, 37827, 83002, 32990, 20027, 7450, 12528, 10693, 34700, 24628, 25052, 30321, 52766, 13469, 25037, 22778, 27076, 18913, 12355, 60058, 41671, 64229, 79737, 30952, 81983, 11241, 26588, 30434, 34335, 75246, 7161, 42843, 2138, 58323, 19976, 29223, 13253, 15704, 43887, 15506, 42639, 13955, 78452, 33828, 24313, 40600, 21985, 14930, 88702, 57545, 17980, 32503, 35761, 161851, 40083, 41989, 3540, 75244, 21494, 96423, 41632, 34773, 10560, 98540, 27657, 10265, 22645, 65203, 30809, 48566, 59685, 36875, 7296, 11701, 118950, 35185, 15649, 52043, 13751, 36057, 55694, 71607, 143626, 15604, 16713, 24177, 158598, 19304, 13964, 42171, 29018, 50340, 36697, 26844, 36021, 56938, 11576, 33854, 55018, 64000, 36293, 27211, 6570, 102953, 31491, 47414, 37642, 14435, 20439, 22358, 19793, 17889, 29894, 21665, 8835, 24870, 33024, 39666, 31076, 44038, 23011, 21942, 35205, 13883, 19044, 42900, 38668, 85282, 54053, 14979, 18090, 65513, 32238, 39368, 34105, 68725, 53162, 24792, 104422]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M. Kroenke

1st Edition

0130086509, 978-0130086501

More Books

Students also viewed these Databases questions

Question

What factors affect occupational accidents?

Answered: 1 week ago

Question

Differentiate between hard and soft measures of service quality.

Answered: 1 week ago

Question

Be familiar with the different perspectives of service quality.

Answered: 1 week ago

Question

Describe key customer feedback collection tools.

Answered: 1 week ago