Often the first step in determining the probability of an event is determining the probability distribution to which the event belongs. Good statisticians can identify the correct distribution right away; if you learn some simple rules, you can specify the correct distribution just like the experts. In this post, I describe how to identify the two most common discrete distributions: binomial and geometric.
Requirements for a binomial distribution
- There are a fixed number, n, of trials.
- All trials are independent.
- The probability, p, of success in each trial is a constant.
- We are interested in the number of successes in the n trials.
If these conditions are met, you have a binomial distribution. Here’s an example question: “A manufacturing process produces widgets with a known defect rate of 2%. If 100 widgets are selected at random, what is the probability that exactly three are defective?” Here, the 100 selected widgets represent the 100 trials (a fixed number). The probability of finding a defective widget in each trial is a constant 2%. We are looking for the probability of finding three defective widgets (we are counting the number of times we find a defective widget). So we know we have a binomial distribution.
Note that if the population is finite, the probability of success will change slightly each time we select an item. That means we do not have a binomial distribution. However, if the population is large enough, the probability does not change very much and we can approximate the problem with a binomial distribution. A common rule of thumb for using the approximate binomial is to require that the population size, N, be at least ten times larger than the sample size, n.
Requirement for a geometric distribution
- All trials are independent.
- The probability, p, of success in each trial is a constant.
- We count the number of trials until the first success.
If these conditions are met, you have a geometric distribution. Note that the conditions are similar to the binomial distribution conditions, but the number of trials is not fixed and we are not counting the number of successes. Here’s an example question: “A manufacturing process produces widgets with a known defect rate of 2%. What is the probability that 100 good widgets are inspected before the first defective widget is found?” Note that this question looks somewhat similar to the previous example. But this time we have phrased the question in a way that leads to a geometric distribution.