Question
Benford's Law is an observation about the distribution of the frequencies of the first digits of the numbers in many different data sets. It is
Benford's Law is an observation about the distribution of the frequencies of the first digits of the numbers in many different data sets. It is frequently found that the first digits are not uniformly distributed, but follow the logarithmic distribution
P(d)=log10(d+1d).P ( d ) = log 10 ( d + 1 d ) .
That is, numbers starting with 1 are more common than those starting with 2, and so on, with those starting with 9 the least common. The probabilities are given below:
Digit | Probability |
---|---|
1 | 0.301 |
2 | 0.176 |
3 | 0.125 |
4 | 0.097 |
5 | 0.079 |
6 | 0.067 |
7 | 0.058 |
8 | 0.051 |
9 | 0.046 |
Benford's Law is most accurate for data sets which span several orders of magnitude, and can be proved to be exact for some infinite sequences of numbers.
Write a Python program that analyzes a set of integers to see how closely it follows Benford's Law. If you need some help thinking about how to do this, here is an excellent article that shows you how to do this in Excel. Get the name of the input file from the command line. Name this program benford.py.
(a) Demonstrate that the first digits of the first 500 Fibonacci numbers (see this Example) follow Benford's Law quite closely. You will need to generate these numbers and put them in a file or look them up and copy/paste them into a file. MAKE SURE YOU HAVE THE CORRECT NUMBERS. Name the file fib500.txt.
(b) The length of the amino acid sequences of 500 randomly-chosen proteins are provided in the file protein_lengths.py. This file contains a list, naa, which can be imported at the start of your program with:
from protein_lengths import naa
To what extent does the distribution of protein lengths obey Benford's Law?
Your program will determine which analysis to perform (Fibonacci or proteins) by reading the command line. For example: "python3 benford.py fib" will perform the Fibonacci analysis and "python3 benford.py protein" performs the protein analysis.
Your output for both programs should be formatted as follows. A heading line followed by 9 lines with each digit, 1-9, followed by the percentage in the format 0.###
Fibonacci Analysis (or Amino Analysis)
1 #.###
2 #.###
3 #.###
thru 9
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
protein_lengths.py
naa = [40055, 27711, 36332, 13309, 12808, 3425, 56868, 44400, 26557, 48423, 14844, 47584, 26641, 27510, 21004, 13527, 80355, 34317, 33004, 32609, 37652, 27852, 192897, 62837, 18936, 27551, 20574, 18708, 31629, 36769, 3394, 51287, 37785, 44999, 19053, 64524, 24274, 44360, 11626, 24942, 18248, 9086, 36636, 22415, 55285, 61596, 64841, 16850, 36145, 28193, 36236, 14643, 5064, 11241, 2890, 29797, 14445, 14332, 33444, 58127, 64262, 12374, 45264, 13555, 5026, 27469, 41704, 11652, 9654, 27854, 60503, 26307, 36306, 43643, 8511, 103416, 16914, 63771, 27967, 55336, 33749, 37857, 34322, 9149, 30535, 11454, 53351, 85349, 152093, 50425, 34097, 25820, 41280, 64104, 54594, 49515, 46720, 35481, 24628, 37274, 54113, 39747, 39373, 15803, 30657, 39552, 41612, 37797, 10416, 33914, 17017, 40889, 10288, 67021, 42538, 8436, 38042, 11930, 77894, 50805, 23632, 49670, 2614, 23954, 9860, 57545, 46826, 58679, 69640, 23940, 50590, 27693, 35705, 21997, 40607, 5414, 38285, 11261, 123691, 51860, 49265, 5125, 46084, 29732, 15568, 18376, 14352, 38818, 17392, 32460, 50364, 4421, 7161, 18344, 21850, 25299, 14129, 14530, 22150, 35136, 5170, 43089, 5748, 43761, 33491, 43027, 66908, 20856, 35835, 34878, 17926, 16568, 27354, 72621, 131102, 11916, 17526, 29761, 84144, 26944, 9300, 626, 23406, 62492, 4230, 4195, 5897, 10393, 64900, 25675, 29653, 20305, 6523, 26771, 13912, 38499, 44921, 9045, 44187, 1344, 7177, 32502, 37091, 88008, 22020, 6790, 15748, 19431, 56321, 5683, 20250, 20437, 29473, 44684, 25930, 30302, 14574, 136485, 35242, 49883, 23935, 1640, 33338, 1508, 70164, 36366, 70161, 36666, 36970, 39757, 82280, 37944, 40951, 27397, 69690, 35003, 66168, 87552, 126056, 57127, 40335, 63955, 125260, 14674, 21548, 14656, 21934, 7862, 24767, 10811, 32661, 46748, 25467, 58673, 11264, 60199, 18799, 6513, 2253, 36712, 22885, 17548, 29870, 37949, 15386, 62111, 51134, 1735, 18269, 82362, 34355, 13199, 16853, 39614, 46452, 13833, 53883, 37563, 36637, 22871, 39574, 30017, 1147, 13522, 44112, 36598, 12525, 10614, 15627, 44802, 47193, 53677, 61054, 12945, 42905, 2558, 331774, 6734, 21457, 51491, 37197, 13542, 16495, 109616, 31379, 17175, 76105, 27734, 25276, 14537, 38191, 22069, 41348, 34746, 35426, 39343, 37929, 47420, 28223, 27755, 27007, 30816, 15623, 98138, 42984, 46130, 12264, 21960, 40755, 98353, 12675, 100728, 57881, 27907, 25722, 18483, 17548, 53622, 71649, 13684, 33692, 96542, 35028, 19657, 12132, 16538, 101121, 19229, 25289, 22407, 72784, 19841, 22639, 81729, 53888, 82644, 49803, 23756, 17963, 42399, 8525, 56083, 37827, 83002, 32990, 20027, 7450, 12528, 10693, 34700, 24628, 25052, 30321, 52766, 13469, 25037, 22778, 27076, 18913, 12355, 60058, 41671, 64229, 79737, 30952, 81983, 11241, 26588, 30434, 34335, 75246, 7161, 42843, 2138, 58323, 19976, 29223, 13253, 15704, 43887, 15506, 42639, 13955, 78452, 33828, 24313, 40600, 21985, 14930, 88702, 57545, 17980, 32503, 35761, 161851, 40083, 41989, 3540, 75244, 21494, 96423, 41632, 34773, 10560, 98540, 27657, 10265, 22645, 65203, 30809, 48566, 59685, 36875, 7296, 11701, 118950, 35185, 15649, 52043, 13751, 36057, 55694, 71607, 143626, 15604, 16713, 24177, 158598, 19304, 13964, 42171, 29018, 50340, 36697, 26844, 36021, 56938, 11576, 33854, 55018, 64000, 36293, 27211, 6570, 102953, 31491, 47414, 37642, 14435, 20439, 22358, 19793, 17889, 29894, 21665, 8835, 24870, 33024, 39666, 31076, 44038, 23011, 21942, 35205, 13883, 19044, 42900, 38668, 85282, 54053, 14979, 18090, 65513, 32238, 39368, 34105, 68725, 53162, 24792, 104422]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started