Question
CSc 110 - Benford's Law Benford's Law is a mathematical law that describes the behavior of naturally-occurring numbers in some kinds of numerical data sets.
CSc 110 - Benford's Law
Benford's Law is a mathematical law that describes the behavior of naturally-occurring numbers in some kinds of numerical data sets. I recommend that you watch this video before proceeding to get an explanation:
Benford's law is useful for distinguishing naturally occurring data from randomized or made-up data. It has been used in the real world to detect election fraud (For example, in the 2009 Iranian election). It has also been used as evidence in criminal cases in the US. In this PA, you'll be writing a program that reads in a data set, and prints out the plot of first-digits. Then, you can look at the plot to determine if it conforms to the law or not! Name your file benfords_law.py. You should organize the code into several functions: main, one for loading the file, one for counting the occurrences, and one for printing the plot.
The Input File
Your program should as the user for the name of an input file, which your program should expect to be formatted as CSV. If you don't know what a CSV file is, or forgot, go watch the video quiz that covered it! Shown below is an example of the program prompting the user for a file name, the types file name (places.csv), and then printing out the plot.
Data file name: places.csv 1 | ############################### 2 | ############## 3 | ########## 4 | ######## 5 | ####### 6 | ##### 7 | #### 8 | #### 9 | #### Follows Benford's Law
After opening up the input file, the program should search through the CSV data for numerical values. The way you should do so is as follows:
- Create an empty list, in which you will append every number you find
- Loop through each line of the file
- For each line, split on a comma (due to it being CSV)
- For each element that you get from splitting, if the FIRST character and LAST character is a numeric digit, and if the first digit is not 0, then convert the string to a float and append to the list of numbers
- For each line, split on a comma (due to it being CSV)
For example, say you had this data file named places.csv:
region,population pima,1234 georgia,145 steele,10 tampa,1700 greece,1729 rome,1711 milan,219 tucson,231 tuscany,20001 florence,301 nigeria,3879 newyork,404 phoenix,40123 belgium,505 madrid,502 nogales,601 brussels,712 tempe,81231 anthem,91231
After reading it in, your numbers list should be:
numbers = [1234.0, 145.0, 10.0, 1700.0, 1729.0, 1711.0, 219.0, 231.0, 20001.0, 301.0, 3879.0, 404.0, 40123.0, 505.0, 502.0, 601.0, 712.0, 81231.0, 9123.0]
In the next step, you should use this list of numbers to build the plot
The Plot
In order to create the plot, you will first have to loop through the numbers list and count how many times a number starts with the digit 1, the digit 2, the digit 3, and so on up to 9. I recommend that you use a dictionary for this counting. If you have a floating-point number x, you can get the first digit as an int by doing int(str(x)[0]). Based on the places.csv data shown earlier, the counts dictionary should be as follows after counting:
counts = {1: 6, 2: 3, 3: 2, 4: 2, 5: 2, 6: 1, 7: 1, 8: 1, 9: 1}
If you forgot how to use a dictionary to count things, go watch the video quiz where I showed how to do so! After counting, loop through the numbers 1 through 9 and figure out the percentage that each occurs. You will use these percentages both to print out the bar chart, and to check if the data follows the law. The way that you would calculate the percentage for a particular digit, as an integer, is:
(count_for_digit / length_of_numbers_list) * 100
The number of # for a digit in the plot should be the same as the percentage of the data that digit appears first. For example, in the places.csv data, there were 3 numbers that started with the digit 2 and there were a total of 19 numbers from the data set, then you should print out int((3 / 19) * 100) = 15%. Thus, 15 hashtags for 2. For each row of the plot, print out the digit, a vertical bar, and then the hashtags. The plot that should print based on the places.csv example is:
1 | ############################### 2 | ############### 3 | ########## 4 | ########## 5 | ########## 6 | ##### 7 | ##### 8 | ##### 9 | #####
Does it follow the Law?
The other thing you should determine is if the data follows Benford's Law. For the purposes of this PA, a data set will follow Benford's law if the percentage of occurrences of each digits follows the following percentages, plus 10% or minus 5%.
digit | percent |
---|---|
1 | 30% |
2 | 17% |
3 | 12% |
4 | 9% |
5 | 7% |
6 | 6% |
7 | 5% |
8 | 5% |
9 | 4% |
If every digit follows, then print out Follows Benford's Law. Otherwise, print out Does not follow Benford's Law.
ExamplesPopulation Data
The populations.csv file contains population information from many countries across the map. If you download this file and run it with your code, you should get:
Data file name: populations.csv 1 | ################################## 2 | ############### 3 | ########### 4 | ######## 5 | ######## 6 | ###### 7 | #### 8 | ##### 9 | #### Follows Benford's Law
Stock Data
The stocks.csv file contains open, max, min, and closing prices for stocks traded on the NYSE from 10/7/2019. If you run the code with this data, you should get:
Data file name: stocks.csv 1 | ############################## 2 | ######################### 3 | ########## 4 | ####### 5 | ####### 6 | ##### 7 | #### 8 | ### 9 | #### Follows Benford's Law
Random Data
The random_numbers.csv file contains a bunch of randomly generated numbers. Due to this, we should NOT expect it to follow Benford's law. When the code is run with these numbers, you should get:
Data file name: random_numbers.csv 1 | ########## 2 | ########### 3 | ############ 4 | ############ 5 | ########## 6 | ############ 7 | ######### 8 | ########## 9 | ########### Does not follow Benford's Law
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started