Share |

 

Supported browsers: Internet Explorer 6.0+, Firefox 3.0+, Safari 3.0+, Chrome 5.0+, iPhone4 (partial support and may have to reload the page). Some graphics may not work in Opera.

Demonstration of probability distributions with balls falling through a pin board.

Let's see what happens when we drop balls one at a time through a narrow chute and each ball moves left or right each time it hits a pin. See the diagram below. Pins are arranged in a regular grid of rows and columns. From one row to the next, the pins and columns shift laterally by half. At the bottom of the given number of rows, each ball remains captive in one of the bins: eleven bins in our case. Each ball hits ten pins before entering one of the eleven bins. (In our case, the final pin is the top of a wall between bins.)

We are interested in how many balls fall in each bin, which is called a (discrete) probability distribution. For our demo, balls start falling as soon as they clear each pin - we won't allow our balls to shoot across multiple columns horizontally. When a ball hits a pin, we have only two possible outcomes: fall immediately to the left or fall immediately to the right. Because each pin produces exactly two outcomes, the distribution of balls in bins is specifically called a binomial probability distribution.

(If the drawing canvas does not appear, is really short, or is blank, reload the page. The drawing canvas can be resized as you please (except on multitouch devices) by dragging and releasing the magenta drag-handle at the lower right of the canvas; the demo will restart.)

  1. For our first demo, when a ball hits a pin, we'll give it equal probability of moving left or right. The ball never disappears and by convention we say that the sum of the probabilities is one (or 100%). Since the probabilities of moving left or right are equal, each individual probability is one half (or 0.5, or 50%): the same as flipping a coin and getting heads or tails. For this special case of equal probabilities, the distribution of balls in bins is a symmetric binomial probability distribution.
  2. When we drop a ball, we will increment the ball counter displayed at the upper right of the simulation. The balls will start off slow so that you can see the path through the pin board. Then the balls will speed up so that we see the probability distribution of hundreds or thousands of balls. When a ball reaches a bin, we'll turn its color to gray.
  3. Rather than show bins tall enough to hold hundreds of balls, let's change the bins to a bar chart showing the percentage of balls in each bin. The height of each blue bar represents the number of balls in that bin divided by the total number of balls that have reached the bins. We'll be interested in the pattern of the blue probability distribution not the exact numbers. The bar chart is scaled up so we can see it better. The top of the bin is about 30% but don't be concerned about the exact values.

    Once thousands of balls are in the bins, the bar chart starts to look symmetrical and centered on the center bin. Let's learn a little about the statistics of the probability distribution.
  4. Let's number the bins one through eleven, left to right. The center bin is number six. To each ball in a bin let's give the ball a value equal to the number of that bin. What is the average value for all the balls? We add up all the ball values and then divide by the total number of balls in all bins. The average value is also call the arithmetic mean, the mean value, or simply the mean. As each new ball enters a bin, the simulation recalculates the mean for the balls in the bins and displays this as the green vertical line. The mean is the horizontal center of the green vertical line.

    When there aren't many balls in the bins, the mean value can change quite a bit as each new ball reaches its bin. You can see this by reloading the page and clicking to get back to this fourth instruction before many balls have fallen. However, once hundreds of balls are in the bins, the mean stabilizes at or near the center of the center bin and the shape of the blue bar chart resembles a bell as from a church belfry. This bell-shaped curve is called a bell curve, a Gaussian distribution, or a normal distribution. The mean is a useful piece of information but it does not tell us how spread out the values are: whether the bell shape is tall and narrow or short and broad.
  5. We would like to have some standardized way to describe the spread or variation of the values (how narrow or broad the bell shape is). We will use the measure called sample standard deviation. As each new ball enters a bin, the simulation recalculates the sample standard deviation for the balls in the bins and displays this as the two red vertical lines. At any given moment the two red lines are equally spaced from the green mean vertical bar. The standard deviation spread varies as balls are added but it stabilizes once hundreds of balls are in the bins.

    The mean, standard deviation, and some mathematics we won't do here gives us a very useful approximation or model for the actual detailed data of a probability distribution.
  6. Let's change the demo. Instead of equal probability of moving left or right, let's make the probability of moving right three times as much as the probability of moving left. This is easy to do in the simulation but not so in an actual physical pin board. So now the probability of moving to the right is 0.75 (75%) and the probability of moving to the left is 0.25 (25%). The total probability is 1 (100%) as it must be but this new demo is not like flipping ordinary coins at all.
  7. We see that the mean is not in the center bin. The bar chart is not symmetric about the mean. The standard deviation is somewhat smaller than in the equal probability case.

We learned about the mean or average value for all the balls in the bins. There is another useful statistical value - the median - which unfortunately is frequently confused with the mean. Sometimes the mean and median are nearly identical in value but in other cases they may be vastly different. Let's learn about the median but not using the demo of balls falling onto pins.

Suppose we have a set of values that we are given, interested in, or working with. We sort the values from smallest to largest. If some values are duplicated, we do not delete duplicates. After sorting, the value that is in the middle of the sorted list is the median value. Let's try an example. Here is a set of data: {295, 4, 1}. What are the mean and median values?

There are three values in this data set. The mean (average) value is (295 + 4 + 1)/3 = 300/3 = 100. Now let's calculate the median value. Here is the data sorted least to most: {1, 4, 295}. The value in the middle of the three values is the second value in the sorted list: 4. The mean is 100, the median is 4: much different and we should not confuse them.

Let's try a different example. Here is a different set of data: {295, 4, 1, 20}. The mean (average) value is (295 + 4 + 1 + 20)/4 = 320/4 = 80. Now let's calculate the median value. Here is the data sorted least to most: {1, 4, 20, 295}. Since there are four values - an even number of values - none of the values are the middle value. The middle is somewhere between 4 and 20. We'll use the exact middle between 4 and 20: (4 + 20)/2 = 24/2 = 12. The mean is 80, the median is 12: much different and we should not confuse them.

This demo correctly represented the probability distributions but did not faithfully represent gravitational acceleration, air resistance, and elastic (bouncy) collision of the balls with the pins. Following are some other persons' demos.

Following are some related Wikipedia articles.

This demo was suggested by my friend Paulo.

I hope you found this interesting, useful, and/or fun. Is there a demo you would like me to add? Would you like to be notified when a new demo is available? Links for sharing, reporting a problem, or emailing me are available in the pull-down menu at the top of the page. Feel free to link to my pages, screencast them to YouTube, or reuse my source code with attribution (MIT-style license).



   Counter          sitemap