## Wednesday, December 3, 2008

### Values, Equations, and Theorems

There is some confusion about theorems. For example, consider the Mean Value Theorem: If f is continuous on [a,b] and differentiable on (a,b), then there is some value c∈(a,b) so that f'(c)=(f(b)-f(a))/(b-a).

Some students think that the ratio (f(b)-f(a))/(b-a) is the Mean Value Theorem. But it is not; it is just a value that is called the average rate of change of f between a and b. You can compute this value as long as both f(a) and f(b) exist. It has nothing to do with derivatives or continuity.

Other students think that the equation f'(c)=(f(b)-f(a))/(b-a). This is closer to the truth, but still is incorrect. First of all, what is c? Second, this statement may not be true. For example, suppose that f(x)=-1 if x<0 and f(x)=1 if x>0. Suppose that a=-2 and b=+2. Then the ratio (f(b)-f(a))/(b-a) is equal to 1/2. But f'(x)=0 everywhere except at x=0, where f'(0) does not exist.

Even closer is to say that f'(c)=(f(b)-f(a))/(b-a) for some c between a and b. This is actually the conclusion of the Mean Value Theorem. It requires the entire statement, particularly the statement that the equation is true for some c and that the value c must be between a and b.

But my example above provides an example where the conclusion of the Mean Value Theorem is false. That does not mean that the Mean Value Theorem itself is false. After all, it is a theorem, and that means that it has been proved to be true always. The part that is missing is the hypothesis for the theorem. The conclusion can only be guaranteed to be true using the theorem if the hypotheses are all satisfied. In this case, you must also check (or give a reason why) the function f is continuous and differentiable on the interval from a to b, including the endpoints for continuity.

Similarly, ∫ab f(x) dx/(b-a) computes the average value of a function f on an interval [a,b]. The value can be computed anytime the function is integrable over the interval [a,b]. The Mean Value Theorem for Integrals has nothing to do (in principle) with this calculation.

However, if f is continuous on [a,b], then
f(c)=∫ab f(x) dx/(b-a)
for some c∈(a,b). This entire statement comprises the Mean Value Theorem for Integrals. The hypothesis that must be verified to use the theorem is that f is continuous on [a,b]. The conclusion is that you are guaranteed that
f(c)=∫ab f(x) dx / (b-a)
for at least one value c between a and b.

## Tuesday, December 2, 2008

### Limit or Function Evaluation?

I've noticed that some students are perplexed about when they use a limit or function evaluation. I presume that the cause of this confusion is that students have learned that you evaluate a limit by plugging in a value. But this is only because nearly all functions that they work with are continuous.

You use a limit evaluation when you need to know what the value of the function should be by using information from the side of the point of interest. When using a limit, you must use limit notation: limx→c f(x). Then you use the appropriate rules of limits to evaluate (and hopefully, the function is continuous).

You use a function evaluation when you need to know the value of the function at an actual point. There is no limit involved, just function evaluation. You just use function notation, say f(c), and compute the value defined by the function.

For example, suppose you are calculating an instantaneous rate of change as the limit of an average rate of change. The average rate of change only makes sense when the interval of interest includes two points (endpoints of an interval). The instantaneous rate of change is found by seeing what the value of the average rate of change does when the two points move closer to each other, or more particularly, as the second point approaches the first point.

On the other hand, suppose that you know the derivative, which is itself a function. Then the instantaneous rate of change is calculated by function evaluation using the derivative function. (The limit was already used to create this new function.)

In particular, I have noticed this problem when dealing with finding extreme values of a function. When the interval of interest is an open interval, we are acting as though the domain does not include the endpoints. So, with this restricted domain, evaluation of the function is not possible (since the points are not in the domain). So we must use the information about the function immediately adjacent to the endpoints, and this is with a limit. In this context, the value found in the limit is not achieved at the endpoint, although it might be achieved somewhere else in the domain.

On the other hand, if the interval is a closed interval, then the endpoints are included in the restricted domain. If the function is continuous at these points, then evaluating the function directly is appropriate, since the point is in the domain.

Final remark, if the function is discontinuous at some point in the interval, you must also check the limits at that point for consideration when looking for extreme values.

## Monday, November 17, 2008

### Differential Equations Project Tips

As some questions are asked more regularly, I thought I'd provide some general discussion here.

(1) Start with the proposed form of X(t).  Compute X'(t) and X''(t) based on that form.  Then use those calculations to discover when X'' + k/m X = 0.

(2) The question "mean physically about the mass on a spring" is not asking you to think about the mass (as in measurement) but is asking you to think about what the statement X(0)=1 means about the state of the mass at time t=0 and what the statement X'(0)=0 means about the state of the mass at time t=0.

(3) X(0) is a constant and has derivative of d/dt[X(0)] = 0.  Recall that dX/dt > 0 implies that X is increasing, dX/dt < 0 implies that X is stationary (instantaneous rate = 0)

(4) An arbitrary quantity A is proportional to some other quantity B if it is always that case that A = k B for some constant value k (the constant of proportionality).  Now interpret the statements to identify what pieces of the equation are proportional to what.

(5) Although you have studied ex in precalculus, you will not use the logarithm at all in this work.  Instead, I just want you to consider some function that has the special property that exp' = exp.  (This sentence is analogous to sin'=cos and cos'=-sin.)  However, you do need to think about the chain rule: X(t) = A exp(rt)  (Since the argument is not simply t, you must use the chain rule.)  This problem is exactly analogous to Step 1.

(6) You will get something like
dX/dt = "formula involving X and a, b, and m"
X is increasing when "formula" > 0, decreasing when "formula" < 0, and stationary when "formula" = 0.  So use your skills with algebra (think sign analysis) to find conditions when these are the case.

(7) You need to understand the relationship between a rate of change and an actual change.  To understand this as well as possible, see the section we skipped in Chapter 3 (last section).  But what you essentially need is that we will follow the tangent line for the time increment Δt.  How much change is there when the rate of change and the duration of time are both known?

(8) The new version of Excel has some unanticipated differences from what I had when I wrote the project.  The labels are not assigned from a menu anymore.  Instead of the 3-step process that is described, you just click in the label field in the header section of Excel and type in the new label and then hit enter.

The calculations you see in the first few lines should exactly match your hand calculations in part (7).

Do not print the spreadsheet (it takes WAY too many pages).  That is why I ask you to submit your spreadsheet on Blackboard as part of the project.

(9) I must receive a print out of the graph -- hand drawn figures are not acceptable.   Ideally, this entire project report would be typed (perhaps using Equation Editor for the equations), with the figures naturally fitting in.

(10) Make hypotheses and test your hypotheses.

### Exponential Project Tips

As some questions are asked more regularly, I thought I'd provide some general discussion here.

(1) exp is the name of the function, just as sin and cos are names of functions. From calculus, you learn that sin'=cos and cos'=-sin.  This step shows that exp_b' = ln b * exp_b. (That is, it leaves the function alone except for a constant multiple. (But be careful where the chain rule is needed!)

(2) You do not need to use the limit definition (epsilons and deltas).  Instead, for perhaps the easiest solution, you should think about how to finish the statement:
lim b^x = lim [(b^x-1)/x ... ]
That is, if you start with (b^x-1)/x, what do you do to that expression to leave only b^x.  Then use elementary limit rules to compute your resulting limit.

(3) One method is to use the method of substitution for limits (change of variables) and then use an identity for the function so that the result of Step 2 is applied --- this method mimics what is done to show that sin x is continuous everywhere.  A second method is to use a general theorem that makes continuity an obvious conclusion of the results from Step 1.

(4) You must start with a statement like:
ln (1/b) = lim_{x → 0} [(1/b)^x - 1]/x
There are two easy approaches: (1) Find a common denominator to rewrite this as a simple fraction before continuing or (2) Think of (1/b)^x as b to some appropriate power and then use a limit substitution.

(5) Since you do not know the derivative of ln x, it is incorrect to use the Mean Value Theorem applied to the logarithm.  Instead, you should apply the MVT to the function exp_b(x) on an interval so that b^a and b^b are incredibly easy and where it is clear which value is larger (so that you know if the average rate of change is positive or negative).  You may use the fact that b^x is positive for all values of x.

(6) The function fb(x) is a linear function. You should write it in slope-intercept form (e.g., mx+b).

(7) Do not attempt to solve the equation fb(x) = exp_b(x).  There is one obvious solution from the definition: x=0.  But the formulas themselves do not explain where there would not be more solutions.  Instead, you should define a function (perhaps g) so that
g(x) = exp_b(x) - fb(x).
You know that g(0) = 0.  You need to show that g(x)>0 for all x ≠ 0.  My hint suggested Rolle's theorem, but I have since found that the Mean Value Theorem helps even more.  Use the Mean Value Theorem to show that for x>0, the average rate of change between 0 and x must be positive.  What about x<0? x="0?">

(8) You may not use a limit form of the type b. You may take a limit of the function fb(x) because that is of a form we know how to work with. Then you should use the result of (7) to conclude what the limit of exp_b(x) must be.

(9) and (10) put all of the previous steps together to perform analysis similar to Sections 4.2 and 4.4 to understand the graph.

## Friday, October 24, 2008

### Spotlight: Math Games

The other day before class, I introduced a little game called Sprouts. I found the rules summarized at the MAA website. There is also a nice discussion on the Science News website. I find it interesting that such a simple game can be analyzed using mathematical properties.

One of my favorite "math" games is a game called Eleusis. This game was invented Robert Abbott as an analogy of the scientific method. So perhaps we should call this a "science" game. The game is played with a deck (or multiple decks) of playing cards. I turn over the first card and then think of a pattern that would start with that card. Now, the remaining players take turns attempting to choose a card from their hands that they believe would be a valid next card in the pattern. If they are correct, I leave it there. If they are incorrect, I move the card out of the sequence and below the card they tried to follow (for future reference).

The goal for the players is to eventually arrive at a hypothesis that they believe explains the pattern. By playing cards, they attempt to critically assess whether their hypothesis is a complete explanation of the pattern. This mimics the scientific method because we see patterns in how nature functions, and through experiment we attempt to see if controlled efforts are consistent with or contradict our acting hypotheses.

Try it out? Let me know how the game goes.

## Friday, October 10, 2008

### Mathematical Induction

The principle of mathematical induction is a topic that our textbook unfortunately skips over. It is used when we want to prove a rule that applies to positive integers. Often, it is the argument that is needed when you want to say, "See! It works the same way for this case and that case, so the pattern will just keep repeating." But to say that a pattern keeps repeating is exactly what we should attempt to make more precise.

The natural numbers are the set of all positive integers: 1, 2, 3, 4, .... It is the dot-dot-dot that creates the problem. Using "..." attempts to tell us that the pattern continues. But what exactly is the pattern? For the natural numbers, the pattern is that you just add one to the previous number. So here is one way of describing the natural numbers, and it is what motivates the principle of mathematical induction.
• 1 is in the set
• For every number that is in the set, call it n, we also have n+1 in the set.
We could restate this using an implication:
• 1 is in the set
• If n is in the set, then (n+1) is in the set.
And that is what we do for all applications of mathematical induction. We provide a starting point (such as 1 is in the set). Then we establish an implication that if a statement is true for one value (n is in the set), then it must also be that the statement is true for the next value (n+1 is in the set).

Here is an example from our past that should have used induction.

Theorem: xn is continuous for n=1, 2, 3, ....

Scratchwork:
Before proving this statement, we should think how we might attempt this without induction. Well, f(x) is really a product xn= x x x ... x, where there are n factors of x. (See how the "..." allows us to hand-wave our notation?) Well, we know that the limit of each factor x will just go to the value c, so the limit must be limx → c f(x) = cn. That use of "..." keeps us from clearly stating how we used the limit of a product, other than again referring to a pattern: "Use the limit of a product n-1 times." The use of induction makes this precise.

Proof:
We prove by induction.
1) First, we show that f(x) = x1 = x is continuous. (This is the starting point)
But this is already known:
limx → c x = c.
So f is continuous at any point c.
The statement is true when n=1.
2) Second, we assume that f(x) = xn is continuous and now show that this implies that g(x) = xn+1 is also continuous. (This is the inductive step)
So assume that f is continuous.
g(x) = x f(x) (Relate the new function in terms of what is assumed)
So using the limit of a product:
limx → c g(x) = limx → c x f(x) = c f(c) = c cn = cn+1 = g(c)
So xn+1 is continuous whenever xn is continuous.

So by induction, since the statement is true for n=1, and whenever the statement is true for one value n it is also true for the next value n+1, the statement is true for all integers starting with 1.
(End of Proof)

Sometimes induction is compared to reaching different rungs on a ladder. The first statement is what allows you to climb onto the first step. The inductive implication is what says that if you have already reached one rung, then you can move to the next rung. Putting the two together, you first climb on the ladder's first rung. Then you know that you can climb from the first to the second rung, from the second rung to the third rung, from the third to the fourth, and so on forever. The implication, in one fell swoop, justifies climbing each step from the previous. The principle of mathematical induction replaces the uncertainty in "..."

## Monday, October 6, 2008

### Intermediate Value Theorem

A theorem is a statement that is always true because it has been proved. Theorems are usually stated as implications. That is, they usually are stated as "If [something is true], then [something else is true]." However, this does not mean that the hypothesis (what appears as [something is true]) is actually true. Nor does it mean that the conclusion (the statement instead of [something else is true]) is true. It means that you are guaranteed to know that the conclusion is true whenever the hypothesis is true.

When applying a theorem, it is your task to establish that the hypothesis is true. Then, by stating the theorem, you are allowed to state that the conclusion is also true.

Here is an example using the Intermediate Value Theorem. Recall that the theorem states that if you have a function f that is continuous on a closed interval [a,b] (where a and b can be any numbers with a < b), then for any y-value C between the values f(a) and f(b), you are guaranteed to be able to find a value x such that a < x < c and f(x) = C.

Here is a hypothetical situation. My car holds 12 gallons of gasoline. (That is not the hypothetical part -- I have actually filled the tank :-) I have installed an automated gas-tank tracking system that records the amount of gas as a function of the car's mileage. (Yep, that's the hypothetical part) If you ask me how much gas I had when the car was at 97,034 miles, then I can tell you it had exactly 5.93 gallons of gasoline.

Last week, I filled up my tank when the car was at 98,012 miles. This morning, I checked my car and it now records the tank as having 1.45 gallons and 98,143 miles. (All figures are also hypothetical, including mileage) So here is a question: will I actually be able to identify a mileage on the car when between that last fill up and today when the car contained exactly 4.7 gallons?

Hmm. Let's see. Imagine that we use the variable x to represent the mileage on the car. Also, let f be a function that measures the gallons in the car f(x) when the mileage is x. We know that f(98,012) = 12 and f(98,143) = 1.45. So C=4.7 is between f(98,012) and f(98,143). What does the Intermediate Value Theorem say?

Now, before you go on, I need to tell you a story. On Friday, I needed to mow the lawn. My backyard is pretty large, so it takes a while. Funny thing! I ran out of gas. I knew I had recently filled the car, so I found my gas siphon and pumped a gallon out of the car's tank and into my gas can. Phew! Glad that was available! Finished the lawn with nary a problem.

So what did you answer?

(Extra credit toward quiz grade if you answer correctly this week by e-mail: waltondb at jmu dot edu)

## Thursday, October 2, 2008

### Common Limit Issues

Part of the challenge of mathematics is learning the language of mathematics. Mathematics is meant to be spoken, kind of like poetry. But in my class, I'm not wanting a cinquain. Scattered thoughts that are related, but not directly connected in sentences, do not a coherent message provide.

So, here are the top ten problems in limits:

1. Dropping "lim" suddenly. If you have two expressions f(x) = g(x) where g(x) is a simplified version of f(x) (cancelled something), you should write lim f(x) = lim g(x) and NOT lim f(x)=g(x). That "lim" doesn't apply to both sides of the equal sign.

2. Writing "lim" too many times. Just because you don't want to forget to write "lim" doesn't mean you write it in front of everything. You keep writing "lim" while you massage the formula into a form where you can decide the limit. As soon as you are allowed to "plug-in" the value, you have just "taken the limit" and you should stop writing "lim".

Example: f(x) = (x^2-4)/(x-2) and g(x) = x+2. We know that f(x)=g(x) for x ≠ 2. So
limx → 2(x^2-4)/(x-2) = limx → 2(x+2) = 2+2 = 4.

3. Lonely "lim". "lim" is not simply an abbreviation for the word "the limit". It is an operator wanting to do something to a formula. It needs a formula next to it at all times. It is without a formula. So when students write "lim = 3", clearly intending to say, "the limit is 3", they are really saying, "the limit of is 3". Of what? And that is the problem. The limit is lonely and has nothing to act on.

4. Stopping at a limit form. Just because you see a zero (0) in the denominator in the limit form does not mean you are done. If the limit has form 0/0, you must try to factor and cancel. If the limit has form L/0, you must identify the sign of the function to decide whether it is going to +&infty; or -&infty;.

5. Writing "=" for undefined values. (Don't do that!) Use a limit form notation to indicate that the denominator is 0 or terms go to infinity.

6. Piecewise using x=a. A piecewise function that has a formula when x=a is a distractor for limits. Remember, a limit always determines what the function would predict if you came from the sides. So a limit never checks at x=a.

7. Writing f(a) instead of lim f(x). This is another piecewise function issue. To check the value predicted by the two sides, you need to say you are checking the sides (meaning limit). So you must write that it is a limit.

8. Using rules for x going to infinity at a. When x goes to infinity, we can ignore any terms that look like 1/x since those terms go to zero as x goes to infinity. However, when x goes to a, those terms are still numbers other than zero. Don't just forget about them (and don't even factor out the dominant terms).

9. Step-by-step when not required. Unless I explicitly ask you to show that the limit has a value using the elementary limit rules, you should just compute the limit. You don't need to spend time showing the step-by-step justification unless asked.

10. No work at all. Often you can see the limit from a graph (say on a calculator). But you need to show a reason on the paper based on mathematics that gives your answer. At the very least, say you looked at a calculator to motivate your answer.

### Epsilons, Deltas and Limits... Oh My!

Yes, Toto, writing proofs of limits can be as scary as the wicked witch from the east! But do not fear, with the right direction, we can squash those problems with ease.

The first step is to realize that we are proving a limit based on its definition. Suppose we need to prove the statement written in its general form:
limxa f(x) = L
Notice that this really is saying that when x is a value close to a, the value of f(x) is close to L. The mathematical statement of this says:
"For any ε > 0, there exists a value δ > 0 so that if 0 < |x-a| < δ then |f(x)-a| < ε."

Now just because the Scarecrow is flapping in the breeze, we don't need to be afraid of this complicated looking formula. Our task is to be able to find a formula for δ in terms of ε so that once you know that the value of x is within δ of the value a (δ says how close), then the value f(x) is within ε of the value L.

To reach that fabled wizard of mathematics called a proof, we just need to follow the yellow brick road outlined below. The proof will always take a form involving four steps corresponding to the four parts of the definition:

1. For any ε > 0: We need to create a proof that works for any ε > 0. So that we have a value to work with, we start with any ε with the requirement ε > 0. So the first statement of the proof is something like, "Given ε > 0" or "Suppose ε > 0" or "Let ε > 0."

2. ... there exists δ > 0 such that: The second step is that we need to provide a recipe for how to provide δ > 0 that will make the rest of the statement true. Unfortunately, by the time we reach this step of the proof, we don't yet know what the right recipe is. Personally, I just write, "Let δ=____" and leave enough space to fill in later.

3. ... if 0 < |x-a| < δ ...: We are starting to prove an implication (if...then... statement). We are successful if we can show the conclusion is true whenever the hypothesis is true. So, to accomplish this, we assume that the hypothesis is true and see what happens. I write, "Assume 0 < |x-a| < δ."

4. ... then |f(x)-L| < ε: This is the conclusion of the implication. And this is also the hardest part of the proof. In the middle of completing the work in this proof, we will discover the recipe needed for δ. At that time, we can go back and fill in the missing pieces.

So the 4th step is the hard one. Don't be cowardly like the lion and give up; there is a method to this as well. For the polynomials that we work with, the value of |f(x)-L| will always factor into something of the form:
|f(x)-L| = |x-a| |"stuff"|
We know from step 3 that |x-a|< δ. We want |"stuff"| to be less or equal to a number, which for convenience in discussion we'll call k. Once we find that number k, then we know:
|f(x)-L| < δ k
So part of our recipe will be to make sure that δ≤ε/k. If the recipe requires no other parts, we can even just use δ=ε/k. With this knowledge, we will have found:
|f(x)-L| < δ k ≤ (ε/k)k = ε

All would be well, except that the wicked professor of the west hasn't yet told you how to find k. Let's step back a moment to the |"stuff"| factor. If you don't, I'll send my winged monkeys to bring you back :-). When we're lucky (and f(x)=mx+b), the factor is already a number. But for any other problem, there will be a formula that still involves x. In these cases, without knowing more about x, we won't know how big the extra "stuff" can become. In order to keep a handle on this "stuff" we are going to require for our recipe that δ itself never gets too large. For the simplest cases, we can require δ ≤ 1. And this means that we can take advantage of knowing that x will be between a-1 and a+1.

In a general problem, we could find the largest value for |"stuff"| using values of x between a-1 and a+1. This might take a bit of work. But for f(x) that is quadratic, "stuff" is going to be another linear looking term. We want that "stuff" to involve x-a, so use x = (x-a) + a.

For example, when a=2, the term x+1 can be rewritten x+1 = (x-2)+2+1 = (x-2)+3. The awesome Triangle Inequality then tells us:
|x+1| = |(x-2)+3| ≤ |x-2| + 3
But we know that |x-2|<δ and we required δ ≤ 1 for our recipe. So |x+1|< 4.

For another example, suppose that f(x)=x2-x, a=3, and L=6. We assume 0<|x-3|<δ and rewrite
|f(x)-L| = |x2-x-6| = |x-3||x+2|
We know |x-3|<δ and we need to find a number k so that |x+2|≤k. Since x is in "stuff", we require δ ≤ 1 and use the triangle inequality:
|x+2| = |(x-3)+3+2| ≤ |(x-3)|+5 < δ+5 ≤6
So |x+2| < 6 (This is our value k=6). Thus we also want to use δ = ε/6 in our recipe. Both requirements are taken care of by the formula δ = min(1, ε/6). So now we know
|x2-x-6| = |x+2||x-3| < 6 δ ≤ 6(ε/6) = ε

We have arrived at the emerald city of our desire and proved the limit statement
limx → 3 x2-x = 6.

But the proof needs to be in the right order:
Given ε>0.
Let δ = min(1, ε/6).
Assume 0 < |x-3| < δ. |x2-x - 6| = |x-3||x+2|
|x+2| = |x-3 + 5|
|x+2| ≤ |x-3| + 5
|x+2| < δ + 5 and δ ≤ 1 so |x+2| < 6
|x2-x-6| = |x-3||x+2| < 6δ and δ ≤ ε/6
So |x2-x-6| < ε
Thus, for all ε>0, there exists δ>0 so that if 0 < |x-3| < δ, then |(x2-x)-6| < ε.
Therefore, limx → 3 x2-x = 6

## Monday, September 15, 2008

### Proofs and Even/Odd Functions

Here is the second part of my chat discussion. I have edited this some, but I hope the essence of the questions and answers.

[Student]: ok, so im still iffy on how to structure proofs, ill know the assumption (obviosly) and the conclusion, but I'm unsure how to structure the premises and justify them
[Professor]: Well, basic ideas include starting the proof with your assumptions.
These are the basic statements you know are true.
[Student]: right

[Professor]: The very last line of the proof should match the conclusion. (and it shouldn't appear earlier) The hard part is what comes in between :-)
[Professor]: But seriously, usually, you take a look at your conclusion and see what type of statement it requires.

[Professor]: If it is an equation that is needed, then you can usually start with one side of the equation and then see what you can put on the other side that you know must be true. (Usually using a definition or algebra)

[Professor]: Then you see how you can use your assumptions to create a statement that leads to your conclusion.

After turning to an actual problem, we started to get into more specifics. The problem was stated as: Suppose that f1 and f2 are odd functions. Prove that f1*f2 is an even function.

[Professor]: What is given knowledge?
[Student]: f1 and f2 are both odd and g1 and g2 are even
[Professor]: What is desired? (conclusion)
[Student]: f1 * f2 is even
[Professor]: What does it mean that f1 is odd?
[Student]: that f(-x) = -f(x)
[Professor]: Make the f into f1, and correct.
[Student]: ah ok ya
[Professor]: So in the actual proof, one line would be...
f1 is an odd function
[Professor]: The next line would interpret...
f1(-x) = -f1(x) for all x in domain of f1
[Professor]: Repeat these two lines for f2.
[Professor]: So far, we have simply restated facts based on the assumptions and our knowledge of odd functions.
[Student]: and that is all stuff that is under the given part? rite
[Professor]: In the proof, it doesn't belong to a "given" part per se, but the reason on the right hand side (if in tabular form) would be that the statement was "Given"
[Professor]: So our proof now has 4 statements.

[Professor]: Now, (scratch work) look at what you need to show: f1*f2 is even. What does that mean?
[Student]: ok ya i have no idea, i feel like it would involve one of the multiplicative properties involving f1 and f2 but i dont really know where to start
[Professor]: What does it mean if I said that G is an even function? g(x) = g(-x) My name was G, so it would be G(x)=G(-x).
[Professor]: But the name could be Brian: Brian(-x)=Brian(x). Voila! I [Brian] am even.
[Professor]: f1*f2 is the name of a function. And for f1*f2 to be even, you need: f1*f2(-x) = f1*f2(x)
[Professor]: It just happens that the name looks like a formula.
[Student]: theres noit a step in between there?
[Student]: to justify how multiplying the functions is ok
[Professor]: Well, there are still steps. But we need to know what we are aiming for.
[Student]: ah
[Professor]: You know what f1 and f2 are. They are odd functions. But exactly what is this new function that we call f1*f2?
[Student]: idk f1(x) * f2(x)
[Professor]: Exactly!
[Student]: or f1(-x)
[Professor]: No
[Student]: * f2(-x)
[Student]: no ok
[Professor]: So we know f1*f2(x) = f1(x)*f2(x)
[Student]: yep
[Professor]: And it looks like you were in the middle of the thought: f1*f2(-x) = f1(-x)*f2(-x)
[Student]: ya
[Professor]: Okay, things are starting to come together.
[Student]: sort of
[Professor]: We want to show f1*f2(-x)=f1*f2(x)
[Student]: substitution?
[Professor]: Yep. Back to the proof. We need to use substitution to get our final result. But we do it one step at a time.

[Professor]: I'll type the left hand side, you type the right hand side.
[Professor]: f1*f2(-x) = ?
[Student]: f1(-x) * f2(-x)
[Professor]: Good. But f1(-x) = ? and f2(-x) = ?
[Student]: f1(x) * f2(x)
[Student]: oh
[Professor]: And finally, f1(x)*f2(x) = ?
[Student]: f1*f2(x)
[Professor]: Precisely. Putting those together, we now know f1*f2(-x) = f1*f2(x).
[Professor]: And that means ...
[Professor]: f1 * f2 is even
[Professor]: Q.E.D. :-)
[Student]: sooo i guess i then assemble all of those steps or just some of them
[Professor]: You can leave out the parts that I said belonged as scratch work.
[Professor]: There were four lines associated with the given information. Then I said "back to the proof" and there were probably four more lines. That is the proof.
[Student]: ic ok
[Student]: ugh, this is def not my fav stuff, i cant believe i haven't learned this before

In summary, here is our proof:
f1 is an odd function
f1(-x) = -f1(x) for x in domain
f2 is an odd function
f2(-x) = -f2(x) for x in domain
f1*f2(-x) = f1(-x) * f2(-x)
= (-f1(x))*(-f2(x))
= f1(x)*f2(x)
= f1*f2(x)
So f1*f2(-x) = f1*f2(x) for x in domain
f1*f2 is an even function

Now, back to the discussion. Another problem dealt with composition. That is, students were to prove: If f1 is an odd function and g1 is an even function, then g1•f1 is an even function. Here is where part of that discussion went.

[Student]: so how do the g1(f1(x)) differ, i guess proving that f1(x) is odd, and then proving then that g1(x) is odd because of f1(x)?
[Professor]: One key point is that the definition of odd or even --- f(-x) = -f(x) and g(-x)=g(x) --- is that the x is simply a place-holder and the same statement would be true regardless of what is in the place of the x.
[Professor]: So for example f(-(x+2)) = -f(x+2), where the "x" was actually the formula x+2.
[Student]: right
[Professor]: Now, be more specific on your question related to composition.
[Student]: so once i prove that f1 is even then it would be the same justification for g1
[Professor]: Careful! f1 is odd (from the given information), so you can't prove it is even.
[Student]: well, opposite justification
[Student]: I'm still not sure of the question.
[Professor]: g1 alone is known to be even. There is nothing to prove about that.
[Student]: ok
[Student]: timeout, so all id have to do is justify g1 as being even regardless of f1?
[Professor]: Why? What are your steps?
[Professor]: And what are you trying to prove?
[Student]: so its given that g1 is even and f1 is odd and we're trying to prove that g1(f1(x)) is even
[Professor]: Technically, g1•f1 is even.
[Professor]: g1(f1(x)) is a value, not a function.
[Professor]: So you need to show g1•f1(-x) = g1•f1(x).
[Student]: wait im confused
[Student]: ok
[Student]: i get it
[Student]: couldnt you just subsitute -x for f1(-x)
[Professor]: No for what you said. f1(-x) and -x are different, so they don't substitute.
[Professor]: But I don't think that is what you were thinking. Try to restate.
[Student]: you could justify g1(f1(x)) = g1(f1(x)) with g(x)=g(-x)
[Student]: -f1(x)*
[Professor]: I think you're on the right track. But be careful that you go step by step.
[Professor]: What are all the steps?
[Professor]: I'll start it off... g1•f1(-x) = ?
[Student]: thats where im confused, do you need a a step for that?
[Professor]: Yes. You must relate the function (g1•f1) to the rest of the formulas.
[Student]: g1(f1(-x))= g1(f1(x) )
[Professor]: What justifies that? The -x belongs to f1, not g1
[Student]: g1(-f1(x)) def of odd fcn
[Professor]: So don't skip that step
[Student]: so whats after that then?
[Professor]: Well, why don't you summarize the statements so far. Start again with: g1•f1(-x) = ...
[Student]: g1(f1(-x)) = g1(-f1(x)) b/c of the def of odd fcns
[Professor]: Very good. Now, what does g1 do when you have g1(-[anything])?
[Student]: g1(-x)=-g1(x) ? but that would make it odd
[Professor]: So use the fact that g1 is even. g1(-x)=?
[Student]: g1(x)
[Professor]: So g1(-U) = g1(U) or g1(-f1(x)) = g1(f1(x))
[Student]: k
[Professor]: It doesn't matter what appears.
[Professor]: g1(-[stuff]) = g1([stuff])
[Professor]: So you left off at: g1 o f1(-x) = g1(f1(-x)) = g1(-f1(x)) = ... (finish it off)
[Student]: g1(-f1(x))=g1(f1(x))
[Professor]: And how does that relate to g1•f1?
[Student]: g1(f1(x)) = even
[Professor]: not equal even. g1(f1(x)) = g1• f1(x).
[Professor]: Recall, you are working with the function g1•f1. You need to show g1•f1(-x) = g1•f1(x).
[Student]: i just got a little confused
[Student]: whats the diff between g1(f1(x)) = g1•f1(x)
[Student]: ah i c nev mind
[Professor]: same value but only g•f can be called the name of the function
[Student]: ok
[Student]: so g1•f1(x) = g1•f1(-x)
[Professor]: That would be the final line to show that g1•f1 is an even function.
[Student]: ok that makes sense

And that leads us to our second proof:
f1 is an odd function
f1(-x) = -f1(x) for x in domain
g1 is an even function
g1(-x) = g1(x) for x in domain
g1•f1(-x) = g1(f1(-x))
= g1(-f1(x))
= g1(f1(x)
= g1•f1(x)
So g1•f1(-x) = g1•f1(x) for x in domain
g1•f1 is an even function

Do you know how to do the justifications of each line now?

### Domain and Codomain

I had a nice chat this afternoon with one of my students. The first topic had to do with the notation f : D→S. Here is what we said:

hi prof walton, i have a quick question
Shoot (but don't hurt me.)
haha, ok so i am under the impression that when functions
say f: x → x that means that the domain and the codomain
are the same but the ranges can be different correct?

The codomain (after arrow) lists the type of numbers
that might be values in the output, while the domain
(before arrow) lists all of the numbers that are in the
list of potential inputs. The range is the list of numbers
that actually are outputs. Is this the question?

yep, so the range is a sub"field" of the codomain rite
it lists all possibles while the range is what numbers
are in the function ?

subset instead of sub"field". Otherwise, yes.
haha ya i was lookin for that word
The cheapest answer for codomain would be simply R
(all real numbers). If the codomain is listed as something
more specific, that helps us understand the function better,
but we still might skip some of the numbers in the set.
ic, so when im looking at f and g that have the same
codomain, that then would not imply that f(g) = g(f)
because the ranges could be different?
or is
f(g) being = to g(f) relational to the domain only
Equality of functions requires that they have the same
domain and the same values at every point in that
domain.
If the domain is different, then the functions
must not be equal.
If the functions have different
values for any point, then the functions are not equal.

so d → s just means that the domain could produce
these outputs right?

That's right. The outputs must be somewhere in the
list known as S.
And the inputs only make sense if they
are in D.

but s just means possible outputs because its the codomain rite
Yes, because it is the codomain (after arrow)
rite rite rite, gotcha, this stuff is weird

## Monday, September 8, 2008

### Fun with Functions

Cryptography is an interesting application of functions. A cipher allows you to take text (for example) and encrypt it into a new form of information that can then be transmitted. Most children learn a particularly simple cipher that is called a substitution cipher. Such a cipher does a direct translation of letter for letter. Below is a simple example, motivated by the "stage-appearance" of the letters in the phrase "The quick brown fox jumps over the lazy dog."

Other simple ciphers include shifting the alphabet a fixed number of letters or reversing the alphabet.

If we think of encrypting a message as being a function from character sequences to new character sequences, then we might imagine applying two different encryptions one after the other. This is function composition. Or we might want to decrypt a message. This is applying the inverse function. In particular, note that for an encryption method to be useful, the inverse must exist. That is, the function must be one-to-one.

Here is a message that I constructed using the QUICKBROWN cipher (above) followed by a shift cipher where an A becomes an R. Have fun!

JLWH EL MBSJ KJ LPMGK VGLHSM ZOG DCSX TGKHL.

## Wednesday, August 27, 2008

### Absolute Value Inequalities

Well, perhaps I muddied the water for some of you. Sorry about that. In terms of skills, when you solve |u|<a where u is any expression and a is another expression [I thought it was for constants, but it turns out to always work], you can solve by finding the intersection (and) of the solutions to u<a and to -u<a, which we write u<a and -u<a. When you solve |u|>a, you find the union (or) of the solutions to u>a and to -u>a, which we write u>a or -u>a.

My explanation in the supplemental handout was to motivate why this works. After all, the course is not just about skills, but it is also about justification. The absolute value is a piecewise-defined function. That is, there are different rules depending on the value of the expression being worked with. The skill-based method that works for absolute value does not work for other piecewise-defined functions. But thinking about each of the "pieces" separately and joining them properly will always work.

Since the handout was a first edition, I'm curious where you found the biggest issues.

## Monday, August 25, 2008

### Solving Equations Graphically

So, in class today, some of you may have been wondering about the computer program that I was using. This utility is called Grapher and it is installed in any recent Mac OS computer. You'll find it in the Utilities folder within Applications.

I was wondering then whether there is a similar resource available for Windows computers as well. Doing a Google search on "Graphing Calculator" I found the following possibility: GraphCalc. I don't have immediate access to check this out, so I'd certainly welcome some comments here as to how well it works.

Now that you have something to work with (and a calculator will work as well, just a little slower), here is something interesting to notice. To solve ax=sin(x), we need to plot y=ax and y=sin(x). You also need to choose a value of a. In Grapher, you would add a New Equation (Cmd-Opt-N) like a=0.25. You now need to find where these graphs intersect.

In class, we learned that we can create new equations that have the same solutions by performing the same operation to both sides (other than division, where we worry about division by zero). So we could get a new equation like a=sin(x)/x. Now we plot y=a and y=sin(x)/x. If you add these as two new graphs instead of getting rid of the old plots, you can compare the two equations graphically. Here is the plot:
I used different colors to distinguish which intersections I was looking for.

For the original equation (ax=sin(x), shown in red), we see there are three intersections. For the new equation (a=sin(x)/x, shown in green), we only have two intersections. But those two intersections agree exactly with the original (see the highlighted intersection at the same value of x marked by circles), and the third corresponds to x=0 which disappeared because we divided both sides by x.

### Calculus I -- Welcome Fall 2008

Welcome to JMU and your first semester of calculus. I'm excited this semester to be teaching calculus again. I hope that you're excited as well.

So, while I was preparing for class, I came across MIT's open courseware program. This is a pretty amazing collection of knowledge that is freely accessible. Feel free to browse it. In particular, I found a text written by Gilbert Strang that will be an outstanding parallel reference for our course this year. I'm especially impressed with how he makes the text conversational in style rather than the more typical dry style of math textbooks.

We'll be pushing through the first chapter of our official textbook very quickly as it should be a review of mathematics that you have already taken.

See you in class!

## Monday, March 31, 2008

### Central Limit Theorem

The central limit theorem allows to use a standard normal random variable as an approximating proxy for a properly centered and rescaled sample mean.

First, for any random variable Y with expected value E[Y]=μy and variance Var(Y)=σy2, if we center and rescale as W=(Y-μy)/&sigmay;, then W will always have expected value E[W]=0 and variance Var(W)=1. But it is not the case that W behaves like a standard normal distribution unless Y is somehow special. However, in special cases, when Y can be represented in terms of a sum of many independent and identically distributed random variables, then the central limit theorem can help us.

In particular, if Y is the sample mean of a random sample X1, X2, ..., Xn, then Y=(X1+...+Xn)/n. Suppose that each Xi has expected value E[Xi]=μx and variance Var(Xi)=σx2. Then we know that μyx and σy2x2/n. Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μy)/σy = (Y-μx)/(σx/√n). Because Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

As a second example, suppose that Y is the sum of a random sample X1, X2, ..., Xn, with Y=X1+...+Xn. Suppose that each Xi has expected value E[Xi]=μx and variance Var(Xi)=σx2. Then we know that μy=nμx and σy2=nσx2. Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μy)/σy = (Y-nμx)/((√n)σx). Again, since Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

Quite a few of our known distributions can be described as a sum of i.i.d. random variables. The most famous is the binomial distribution. Suppose that Y has a binomial distribution with n trials and probability p. Then we can think of Y as the sum of n i.i.d. Bernoulli random variables X1,...,Xn, each with parameter p. Then μx=E[Xi]=p and σx2=p(1-p), leading to μy=np and σy2=np(1-p). Consequently, W=(X-np)/√(np(1-p)) has an approximately normal distribution. In this example, we usually find that we need np≥5 and n(1-p)≥5 for the approximation to be good.

Other random variables that can be represented as a sum of simpler i.i.d. random variables. The Poisson distribution for large value λ can be rewritten as the sum of many smaller Poisson RVs with small values of λ (X1,...,Xn each Poisson with rate λ/n). The Gamma distribution (including Chi-Square) can similarly be written as a sum of many simpler Gamma (or Chi-Square) random variables. In each of these cases, a centered and rescaled version of the random variable W=(Y-μy)/σy will be approximately a standard normal distribution.

## Saturday, February 9, 2008

### Success with Random Variables

So this afternoon, I finished grading a quiz focusing on random variables associated with a sequence of Bernoulli trials. Based on these quizzes, I'm trying to understand what makes a probability course so difficult for students to understand. The other night, I was at a department social and talking with a professor who has taught Math 318 many times. He came right out and stated (without me giving any prompting) that he is always surprised at how students have such a hard time with the course, even though the actual mathematics involved in the course are fairly straight-forward.

In the last entry, I noted that at least part of the challenge is that there are so many different types of functions that appear. But I think that a significant issue comes back to confusion about what random variables represent and how they relate to questions that are posed. Recall that a random variable summarizes some aspect of a random experiment as a single number. (We will soon generalize to the ability to summarize with multiple random variables.) Typical questions in probability focus on the probability of events and the average of certain quantities. Almost always, we answer such questions by identifying an appropriate random variable. We then decide how to characterize events in terms of that random variable, or how to express the quantity being averaged in terms of that random variable.

The examples of random variables related to a sequence of Bernoulli trials provide our first real example of this type of reasoning, and I think this transition is part of why the topic was difficult for many of you. First, we must remember that the random variable is not the same as the random experiment, but simply one of many different ways to summarize an aspect of that experiment. The experiment itself is characterized completely by the sequence of Bernoulli trials, each of which is going to be either a success or a failure. Random variables will be measurement that are related to these successes or failures, and we must choose an appropriate measurement that will allow us to answer the questions of interest.

On our quiz last week, I gave the following scenario: "Apples are packaged in 3-pound bags. Suppose that 4% of the time, the bag weighs less than 3 pounds. You select bags randomly and weigh them in order." In principle, there is an unlimited supply of bags of apples, and we indefinitely select a random bag and weigh it. If we were to weigh enough bags over a long enough time, we would find that the fraction that are underweight appears 4% of the time. However, for any particular number of weighings, we could have found any number of underweight bags. The random experiment is described by an infinite sequence of F's and U's, with an F if the bag was full and a U if the bag was underweight.

So now consider the first question: "What is the probability that you must weigh at least 20 bags before you find an underweight bag?" One method that we could use to do this is to directly understand the event in question. That is, we could create a tree diagram that would eventually give enough information to answer the question. Unfortunately, this tree diagram would be much to cumbersome to answer a question about the first 20 bags weighed: 2^20 different outcomes after 20 weighings. So we try to see if there is a random variable for this random experiment that has enough information to answer the question instead.

There are actually multiple ways to choose a random variable to answer the question. The best random variable is a geometric random variable, which counts the number of Bernoulli trials until we see the first success. Since our question considers when we find the first underweight bag, we define success for our purposes as weighing a bag as underweight. Thus, X counts the number of weighings until we find the first underweight bag. If the first bag is underweight, X=1. But if the first 5 bags are full and the 6th bag is underweight, X=6. Every path on our hypothetical tree diagram is associated with a particular value of the random variable. Since an underweight bag will be chosen with probability p=0.04, we use X~Geometric(p=0.04). Now, having chosen a random variable, we must determine how we answer the question in terms of that random variable. The event of interest, "weigh at least 20 bags before you find an underweight bag," corresponds to an event, "X is greater than or equal to 20". Thus, we wish to compute P[X≥20].

Another way that we could answer the question is to consider that finding an underweight bag for the first time on or after the 20th weighing means that the first 19 bags must have all been full. Consequently, we could answer our question using a Binomial random variable with n=19. So let X count the number of underweight bags out of the first 19 that are weighed. Again, a "success" is finding an underweight bag, so that X~Binomial(n=19,p=0.04). Our event can now be restated as saying that "X is equal to 0". But remember that the X in this paragraph is different from the random variable in the previous paragraph. For this random variable, our question will be answered by computing P[X=0].

The third question on the quiz asked, "What is the probability that you find the 5th underweight bag before you weigh 80 bags?" The best choice for a random variable that will answer our question is to count the number trials until you do find the 5th underweight bag. If we say that a trial is a success if the bag is underweight, then we are counting trials until the 5th success. That is, our random variable X is a negative binomial random variable with r=5 (the number of successes needed to stop counting) and p=0.04 (the success probability). We write this: X~Neg.Binom.(r=5,p=0.04). To answer the question, we must realize that the event of interest is to say that X<80. (We stop counting before we reach 80.) Thus, the answer would be P[X<80].

And of course, as I mentioned earlier, we could actually compute the probability using another type of random variable. For this problem, we must find another way to describe the event of interest. One way to do this is to realize that we find the fifth bag prior to the 80th weighing if there are at least 5 underweight bags by the time we weigh the 79th bag. This can be represented using a binomial random variable. Let our random variable X count the number of underweight bags in the first 79 weighed. Thus, X~Binomial(n=79, p=0.04). The probability we wish to compute is P[X>5].

In summary, to do well in probability, you will need to think probabilistically. Before you can actually compute a quantity (which usually involves fairly straightforward mathematics), you must identify a random variable and understand how to answer the question in terms of that random variable. Then you can use one of the appropriate functions to finally answer the question, whether it is a probability or an expectation.

## Monday, February 4, 2008

### Random Variables

There seems to be a lot of confusion in relation to random variables. Part of this has to do with students being pressed onto new subjects without necessarily understanding the previous material adequately. Part of this has to do with the plethora of new functions that are introduced in relation to these random variables (e.g., a random variable X is itself a function in the probability space, it has a p.m.f., a m.g.f., and a c.d.f.). And part of this confusion has to do with a variety of ways to compute things: probabilities, expectations, and moments. Before you allow yourself to be engulfed by the onslaught of waves of probability theory crashing down on you, take a breath (of air) and consider the following.

Probabilities most fundamentally describe random experiments. The probability space (or sample space) describes the possible outcomes of the experiments in as much detail as necessary to completely characterize that experiment. A random variable is a single numeric summary of the nature of the experimental outcome. There are typically many different outcomes of the experiment that can lead to the same value for the random variable. Furthermore, a single experiment can provide the information for many different random variables, each of which summarizes a different aspect of the experiment.

For example, suppose our experiment consists of rolling two standard 6-sided dice, one of which is red and the other is green. There are 36 distinct outcomes. We might be interested in the value of the red die, which value we could assign to a random variable R. We might instead be interested in the value of the green die, which value we could assign to a random variable G. Other values of interest might be the largest of the two values, the smallest of the two values, the sum of the values, the difference between the dice, the greatest common factor between the values, etc. The list could go on forever, with each summary value corresponding to a different random variable.

The random variables R and G are independent because the roll of one does not influence the other. They are also identically distributed. The probability mass function (p.m.f.) for the random variable describes the probabilities of individual values for R. We either need a formula for an arbitrary value or a table showing all of the values to describe this function. For this random variable, we have R(x)=1/6 for each of the values in the support S={1,2,3,4,5,6}. The expected value is defined as a sum over all possible values in the support
For a single die roll (R or G), we have E[R] = 1/6(1)+1/6(2)+...+1/6(6) = 3.5.

Let us now consider another random variable, S, which represents the sum of the two values. That is, S=R+G. The support for S is the set {2,3,...,12}. We could compute the p.m.f. for this random variable: f(7)=6/36, f(6)=f(8)=5/36, f(5)=f(9)=4/36, f(4)=f(10)=3/36, f(3)=f(11)=2/36, and f(2)=f(12)=1/36. And we could compute the expected value of the random variable using the definition of mathematical expectation:

E[S]=2(1/36)+3(2/36)+...+10(3/36)+11(2/36)+12(1/36).

However, since S=R+G, we have a lovely little theorem that allows us to use a sum rule:
E[S] = E[R+G] = E[R]+E[G] = 3.5+3.5 = 7.

This is much easier than calculating using the definition.

So, we learn a lesson: if we can express a random variable as a sum of easier random variables, expected value may be more effectively calculated using these individual terms.

To continue, let us consider the distance between the two rolls. That is, we introduce a random variable X = |R-G| to represent the distance between the rolls. The smallest possible value for X is 0, which occurs when the dice are the same. The largest possible value for X is 5, which occurs when one die is 1 and the other is 6. So the support for X is the set S={0,1,2,3,4,5}. In this example, a table for the p.m.f. is much easier, computed based on the number of ways to obtain each distance:
f(0)=6/36, f(1)=10/36, f(2)=8/36, f(3)=6/36, f(4)=4/36, f(5)=2/36

For this problem, the random variable X is not easily expressed as a sum. So we must compute expected value using the original definition:

E[X]=0(6/36)+1(10/36)+2(8/36)+3(6/36)+4(4/36)+5(2/36)=70/36.

In summary, a random variable is a single number that summarizes some aspect of a random experiment. The p.m.f. of the random variable gives probabilities of individual outcomes. The expected value (or mathematical expectation) computes the average of a random quantity weighted by appropriate probabilities of those values.

## Wednesday, January 23, 2008

### Statistical Independence

I was talking with a student about independence the other day and realized that the student was thinking of independence as being unrelated. In usual speech, we probably think of two outcomes as independent as being that they are not connected. For example, we say that the American states won their independence from England when they broke the governing ties with England.

However, statistical independence should be viewed in a different way. It refers to events A and B so that A provides no information on whether the event B occurs or not. If A and B (remember, these are sets) have no overlap (mutually exclusive or disjoint), then if you knew that the outcome was in A, then it would be impossible that the outcome is in B. Thus, the event A is providing information regarding event B and these are not statistically independent. Similarly, if I knew that the outcome was not in A (i.e., the outcome is in A'), then we know that B is a larger portion of the remaining possible outcomes. Again, that the outcome is in A' is giving information whether the outcome is in event B.

We thus draw the conclusion that independent events must overlap. In fact, they must overlap in a very significant way. Suppose that A and B are independent and that P(A)=0.25 while P(B)=0.4. By the definition of independence, we must have P(A∩B)=P(A)P(B)=(0.25)(0.4)=0.1. To make this concrete, imagine that the sample space Ω has 100 possible, equally likely outcomes. Then A includes 25 outcomes and B includes 40 outcomes. Our calculation then requires that the intersection A∩B includes 10 outcomes.

Now, notice what this means about conditional probabilities. If we restrict our attention to the set A (i.e., we are given that event A has occurred), then the newly restricted outcome space has 25 outcomes. If we ask the probability that B occurs given this information, we know that 10 of these outcomes belong to B. So P(B|A)=10/25=0.4. Voila! This is exactly the same as P(B). Similarly, if we are given that event B has occured, then 10 of the 40 outcomes available belong to A so that P(A|B) = 10/40=0.25=P(A).

How do we summarize this idea? Well, if events A and B are independent, then each event must have a restricted but proportional representation of the other independent event. One way that we can do this is imagine that A and B are at right angles and overlap. For the above example, we can arrange the 100 items into 5 rows and 20 columns.

Then A might represent the first two columns (5×2=10) while B represents the first two rows (2×20=40). Being given information that A occurs collapses the larger picture into a reduced picture consisting of only two columns. However, the fraction of rows represented by B remains the same and P(B|A)=P(B).

This idea expands to more than two events. To have three independent events A, B, and C, we must imagine a three-dimensional grid corresponding to the outcome space so that A represents a simple division in one directions, B represents a second direction, and C represents the third direction. Conditional probabilities given event A corresponds to collapsing the space in A's direction. But the B and C fractions of A remain exactly in the same proportion as they were originally.

## Tuesday, January 22, 2008

### Simplifying with Factorials

As I was grading the second quiz, I see that I should point out that you should simplify as much as possible whenever you see a fraction involving factorials. For example suppose that you saw a fraction: 10!/(4!6!). You should know that immediately you can cancel the last 6 numbers to get: (10*9*8*7)/(4*3*2*1). But instead of multiplying out, you should cancel the 8 with 4*2 and the 9 with 3 to get: (10*3*7)=210.
If you just multiply things out and do not simplify as you go, you will find that you get some awful numbers that are hard to find factors. In fact, you probably already were looking at the factors.

## Monday, January 21, 2008

### Discrete Random Variables (Section 2.1)

My philosophy is that class time should be used to facilitate learning and not simply to reiterate concepts that the book already explains adequately. I readily acknowledge that I am still learning how to accomplish such a feat, especially to help students overcome the tendency to avoid reading their textbook.

As we leave chapter 1 where we learned basic ideas of the probability of events, we begin chapter 2 where we will focus on a family of random variables of the discrete type.
Comparing Definition 2.1-1 with the definition I game in my first week slides, you should notice that there is a distinction between the outcome space of the experiment and the space of the random variable. The outcome space should represent the detailed description of the experiment, while the space of the random variable is the range of the random variable (as a function of the outcome or sample space). Pages 58 and 59 provide an important philosophical guide for what we are trying to accomplish and point out that observations might help us to estimate the probabilities associated with the random variable. However, we can often use basic assumptions to create a mathematical model for these probabilities. This chapter introduces a number of models that describe discrete random behavior.

Definition 2.1-2 is very important, introducing the definition of the probability mass function. Problem 3 in the textbook helps test if you understand the basic ideas. One of the major points you need to remember is that for discrete type random variables, properties (b) and (c) compute probabilities using summations. When we get to continuous type random variables, the corresponding properties will replace summation with integration.

In addition to basic principles (probability mass function (mathematical model' prediction) vs relative frequency (statistical estimate), bar graph vs histogram), we meet the first model for a random variable---the hypergeometric distribution---which describes choosing n objects from a total collection of two types of objects. The model is created by considering exactly the types of calculations used in chapter 1, by counting how many ways to select n objects from a total of N objects (denominator) and then also counting how many ways to choose x of the first type and n-x of the second type (numerator).

### Spring 2008 Introduction

Welcome to my math blog. Last semester, I created a blog that would be specific to one course. But it seems silly to keep creating new blogs for each course. So I'm going to experiment with using one blog for a sequence of courses.
This semester (Spring 2008), I am teaching Math 318, an introduction to probability and statistics. Most of the blog entries will correspond to that course for the next few months.