I was talking with a student about independence the other day and realized that the student was thinking of independence as being unrelated. In usual speech, we probably think of two outcomes as independent as being that they are not connected. For example, we say that the American states won their independence from England when they broke the governing ties with England.

However, statistical independence should be viewed in a different way. It refers to events A and B so that A provides no information on whether the event B occurs or not. If A and B (remember, these are sets) have no overlap (mutually exclusive or disjoint), then if you knew that the outcome was in A, then it would be impossible that the outcome is in B. Thus, the event A is providing information regarding event B and these are not statistically independent. Similarly, if I knew that the outcome was not in A (i.e., the outcome is in A'), then we know that B is a larger portion of the remaining possible outcomes. Again, that the outcome is in A' is giving information whether the outcome is in event B.

We thus draw the conclusion that independent events must overlap. In fact, they must overlap in a very significant way. Suppose that A and B are independent and that P(A)=0.25 while P(B)=0.4. By the definition of independence, we must have P(A∩B)=P(A)P(B)=(0.25)(0.4)=0.1. To make this concrete, imagine that the sample space Ω has 100 possible, equally likely outcomes. Then A includes 25 outcomes and B includes 40 outcomes. Our calculation then requires that the intersection A∩B includes 10 outcomes.

Now, notice what this means about conditional probabilities. If we restrict our attention to the set A (i.e., we are given that event A has occurred), then the newly restricted outcome space has 25 outcomes. If we ask the probability that B occurs given this information, we know that 10 of these outcomes belong to B. So P(B|A)=10/25=0.4. Voila! This is exactly the same as P(B). Similarly, if we are given that event B has occured, then 10 of the 40 outcomes available belong to A so that P(A|B) = 10/40=0.25=P(A).

How do we summarize this idea? Well, if events A and B are independent, then each event must have a restricted but proportional representation of the other independent event. One way that we can do this is imagine that A and B are at right angles and overlap. For the above example, we can arrange the 100 items into 5 rows and 20 columns.

Then A might represent the first two columns (5×2=10) while B represents the first two rows (2×20=40). Being given information that A occurs collapses the larger picture into a reduced picture consisting of only two columns. However, the fraction of rows represented by B remains the same and P(B|A)=P(B).

This idea expands to more than two events. To have three independent events A, B, and C, we must imagine a three-dimensional grid corresponding to the outcome space so that A represents a simple division in one directions, B represents a second direction, and C represents the third direction. Conditional probabilities given event A corresponds to collapsing the space in A's direction. But the B and C fractions of A remain exactly in the same proportion as they were originally.

## Wednesday, January 23, 2008

## Tuesday, January 22, 2008

### Simplifying with Factorials

As I was grading the second quiz, I see that I should point out that you should simplify as much as possible whenever you see a fraction involving factorials. For example suppose that you saw a fraction: 10!/(4!6!). You should know that immediately you can cancel the last 6 numbers to get: (10*9*8*7)/(4*3*2*1). But instead of multiplying out, you should cancel the 8 with 4*2 and the 9 with 3 to get: (10*3*7)=210.

If you just multiply things out and do not simplify as you go, you will find that you get some awful numbers that are hard to find factors. In fact, you probably already were looking at the factors.

If you just multiply things out and do not simplify as you go, you will find that you get some awful numbers that are hard to find factors. In fact, you probably already were looking at the factors.

## Monday, January 21, 2008

### Discrete Random Variables (Section 2.1)

My philosophy is that class time should be used to facilitate learning and not simply to reiterate concepts that the book already explains adequately. I readily acknowledge that I am still learning how to accomplish such a feat, especially to help students overcome the tendency to avoid reading their textbook.

As we leave chapter 1 where we learned basic ideas of the probability of events, we begin chapter 2 where we will focus on a family of random variables of the discrete type.

Comparing Definition 2.1-1 with the definition I game in my first week slides, you should notice that there is a distinction between the outcome space of the experiment and the space of the random variable. The outcome space should represent the detailed description of the experiment, while the space of the random variable is the range of the random variable (as a function of the outcome or sample space). Pages 58 and 59 provide an important philosophical guide for what we are trying to accomplish and point out that observations might help us to estimate the probabilities associated with the random variable. However, we can often use basic assumptions to create a mathematical model for these probabilities. This chapter introduces a number of models that describe discrete random behavior.

Definition 2.1-2 is very important, introducing the definition of the probability mass function. Problem 3 in the textbook helps test if you understand the basic ideas. One of the major points you need to remember is that for discrete type random variables, properties (b) and (c) compute probabilities using summations. When we get to continuous type random variables, the corresponding properties will replace summation with integration.

In addition to basic principles (probability mass function (mathematical model' prediction) vs relative frequency (statistical estimate), bar graph vs histogram), we meet the first model for a random variable---the hypergeometric distribution---which describes choosing n objects from a total collection of two types of objects. The model is created by considering exactly the types of calculations used in chapter 1, by counting how many ways to select n objects from a total of N objects (denominator) and then also counting how many ways to choose x of the first type and n-x of the second type (numerator).

As we leave chapter 1 where we learned basic ideas of the probability of events, we begin chapter 2 where we will focus on a family of random variables of the discrete type.

Comparing Definition 2.1-1 with the definition I game in my first week slides, you should notice that there is a distinction between the outcome space of the experiment and the space of the random variable. The outcome space should represent the detailed description of the experiment, while the space of the random variable is the range of the random variable (as a function of the outcome or sample space). Pages 58 and 59 provide an important philosophical guide for what we are trying to accomplish and point out that observations might help us to estimate the probabilities associated with the random variable. However, we can often use basic assumptions to create a mathematical model for these probabilities. This chapter introduces a number of models that describe discrete random behavior.

Definition 2.1-2 is very important, introducing the definition of the probability mass function. Problem 3 in the textbook helps test if you understand the basic ideas. One of the major points you need to remember is that for discrete type random variables, properties (b) and (c) compute probabilities using summations. When we get to continuous type random variables, the corresponding properties will replace summation with integration.

In addition to basic principles (probability mass function (mathematical model' prediction) vs relative frequency (statistical estimate), bar graph vs histogram), we meet the first model for a random variable---the hypergeometric distribution---which describes choosing n objects from a total collection of two types of objects. The model is created by considering exactly the types of calculations used in chapter 1, by counting how many ways to select n objects from a total of N objects (denominator) and then also counting how many ways to choose x of the first type and n-x of the second type (numerator).

### Spring 2008 Introduction

Welcome to my math blog. Last semester, I created a blog that would be specific to one course. But it seems silly to keep creating new blogs for each course. So I'm going to experiment with using one blog for a sequence of courses.

This semester (Spring 2008), I am teaching Math 318, an introduction to probability and statistics. Most of the blog entries will correspond to that course for the next few months.

This semester (Spring 2008), I am teaching Math 318, an introduction to probability and statistics. Most of the blog entries will correspond to that course for the next few months.

Subscribe to:
Posts (Atom)