Part I: Find 50 numbers in documents that are available to the general public (newspapers, magazines, etc.) and record the first nonzero digit. Summarize the results in the relative frequency table on the next page (be sure to use the column "first digits"). Identify the source(s) of your numbers here.
Part II: Find 50 numbers in documents available to the general public and record the last nonzero digit. Summarize the results in the frequency table on the next page (be sure to use the column "last digits"). Identify the source(s) of your numbers here.
Part III: If the digits 1 through 9 occurred with equal frequency, then the probability of digit n would be . Comment on how closely your first digits follow this distribution.
Part IV: Again using the probability of digit n is , comment on how closely your last digits follow this distribution.
Part V: More than 100 years ago, before the advent of scientific calculators, logarithms and other transcendental functions were recorded in tables, which were then bound into books. An astronomer and mathematician Simon Newcomb (1835 - 1909) observed that the initial pages of logarithm books were more worn and smudged than later pages. This meant that the earlier pages (that is, the smaller numbers) were used more often than the larger numbers. Upon investigating, he conjectured that the occurrence of first digits follows a particular probability distribution, namely, the probability that the first digit is n is given by . Newcomb published his results in a brief article in the American Journal of Mathematics in 1881, but he offered no proof of his conjecture.
Fifty-seven years later, physicist Frank Benford of General Electric noticed the same phenomenon with logarithm books and (apparently unaware of Newcomb's earlier work), postulated the same probability distribution. Benford tested this distribution on a wide range of data sets, from river basin areas and population figures to baseball statistics and numbers appearing in Reader's Digest articles. He found that the data fit the logarithmic model amazingly well.
This distribution is now known as Benford's distribution and has many applications, including the design of computers and to detect fraud or fabrication of data in financial documents and income tax returns.
|
Relative Frequency*
|
Benford's
distribution
|
Digit
|
First
digits
|
Last
digits
|
1
|
|
|
|
2
|
|
|
|
3
|
|
|
|
4
|
|
|
|
5
|
|
|
|
6
|
|
|
|
7
|
|
|
|
8
|
|
|
|
9
|
|
|
|
Remember: The relative frequency is the frequency divided by the number of observations, and is an approximation to the "true" probability.
Part VI: Use Benford's distribution to calculate the probability of each digit and record your results in the table above. Comment on how closely your distribution of first digits follows Benford's distribution. Why are they not exactly the same?