Monday, September 8, 2008

Fun with Functions

Cryptography is an interesting application of functions. A cipher allows you to take text (for example) and encrypt it into a new form of information that can then be transmitted. Most children learn a particularly simple cipher that is called a substitution cipher. Such a cipher does a direct translation of letter for letter. Below is a simple example, motivated by the "stage-appearance" of the letters in the phrase "The quick brown fox jumps over the lazy dog."



Other simple ciphers include shifting the alphabet a fixed number of letters or reversing the alphabet.

If we think of encrypting a message as being a function from character sequences to new character sequences, then we might imagine applying two different encryptions one after the other. This is function composition. Or we might want to decrypt a message. This is applying the inverse function. In particular, note that for an encryption method to be useful, the inverse must exist. That is, the function must be one-to-one.

Here is a message that I constructed using the QUICKBROWN cipher (above) followed by a shift cipher where an A becomes an R. Have fun!

JLWH EL MBSJ KJ LPMGK VGLHSM ZOG DCSX TGKHL.

Wednesday, August 27, 2008

Absolute Value Inequalities

Well, perhaps I muddied the water for some of you. Sorry about that. In terms of skills, when you solve |u|<a where u is any expression and a is another expression [I thought it was for constants, but it turns out to always work], you can solve by finding the intersection (and) of the solutions to u<a and to -u<a, which we write u<a and -u<a. When you solve |u|>a, you find the union (or) of the solutions to u>a and to -u>a, which we write u>a or -u>a.

My explanation in the supplemental handout was to motivate why this works. After all, the course is not just about skills, but it is also about justification. The absolute value is a piecewise-defined function. That is, there are different rules depending on the value of the expression being worked with. The skill-based method that works for absolute value does not work for other piecewise-defined functions. But thinking about each of the "pieces" separately and joining them properly will always work.

Since the handout was a first edition, I'm curious where you found the biggest issues.

Monday, August 25, 2008

Solving Equations Graphically

So, in class today, some of you may have been wondering about the computer program that I was using. This utility is called Grapher and it is installed in any recent Mac OS computer. You'll find it in the Utilities folder within Applications.

I was wondering then whether there is a similar resource available for Windows computers as well. Doing a Google search on "Graphing Calculator" I found the following possibility: GraphCalc. I don't have immediate access to check this out, so I'd certainly welcome some comments here as to how well it works.

Now that you have something to work with (and a calculator will work as well, just a little slower), here is something interesting to notice. To solve ax=sin(x), we need to plot y=ax and y=sin(x). You also need to choose a value of a. In Grapher, you would add a New Equation (Cmd-Opt-N) like a=0.25. You now need to find where these graphs intersect.

In class, we learned that we can create new equations that have the same solutions by performing the same operation to both sides (other than division, where we worry about division by zero). So we could get a new equation like a=sin(x)/x. Now we plot y=a and y=sin(x)/x. If you add these as two new graphs instead of getting rid of the old plots, you can compare the two equations graphically. Here is the plot:
I used different colors to distinguish which intersections I was looking for.

For the original equation (ax=sin(x), shown in red), we see there are three intersections. For the new equation (a=sin(x)/x, shown in green), we only have two intersections. But those two intersections agree exactly with the original (see the highlighted intersection at the same value of x marked by circles), and the third corresponds to x=0 which disappeared because we divided both sides by x.

Calculus I -- Welcome Fall 2008

Welcome to JMU and your first semester of calculus. I'm excited this semester to be teaching calculus again. I hope that you're excited as well.

So, while I was preparing for class, I came across MIT's open courseware program. This is a pretty amazing collection of knowledge that is freely accessible. Feel free to browse it. In particular, I found a text written by Gilbert Strang that will be an outstanding parallel reference for our course this year. I'm especially impressed with how he makes the text conversational in style rather than the more typical dry style of math textbooks.

We'll be pushing through the first chapter of our official textbook very quickly as it should be a review of mathematics that you have already taken.

See you in class!

Monday, March 31, 2008

Central Limit Theorem

The central limit theorem allows to use a standard normal random variable as an approximating proxy for a properly centered and rescaled sample mean.

First, for any random variable Y with expected value E[Y]=μy and variance Var(Y)=σy2, if we center and rescale as W=(Y-μy)/&sigmay;, then W will always have expected value E[W]=0 and variance Var(W)=1. But it is not the case that W behaves like a standard normal distribution unless Y is somehow special. However, in special cases, when Y can be represented in terms of a sum of many independent and identically distributed random variables, then the central limit theorem can help us.

In particular, if Y is the sample mean of a random sample X1, X2, ..., Xn, then Y=(X1+...+Xn)/n. Suppose that each Xi has expected value E[Xi]=μx and variance Var(Xi)=σx2. Then we know that μyx and σy2x2/n. Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μy)/σy = (Y-μx)/(σx/√n). Because Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

As a second example, suppose that Y is the sum of a random sample X1, X2, ..., Xn, with Y=X1+...+Xn. Suppose that each Xi has expected value E[Xi]=μx and variance Var(Xi)=σx2. Then we know that μy=nμx and σy2=nσx2. Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μy)/σy = (Y-nμx)/((√n)σx). Again, since Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

Quite a few of our known distributions can be described as a sum of i.i.d. random variables. The most famous is the binomial distribution. Suppose that Y has a binomial distribution with n trials and probability p. Then we can think of Y as the sum of n i.i.d. Bernoulli random variables X1,...,Xn, each with parameter p. Then μx=E[Xi]=p and σx2=p(1-p), leading to μy=np and σy2=np(1-p). Consequently, W=(X-np)/√(np(1-p)) has an approximately normal distribution. In this example, we usually find that we need np≥5 and n(1-p)≥5 for the approximation to be good.

Other random variables that can be represented as a sum of simpler i.i.d. random variables. The Poisson distribution for large value λ can be rewritten as the sum of many smaller Poisson RVs with small values of λ (X1,...,Xn each Poisson with rate λ/n). The Gamma distribution (including Chi-Square) can similarly be written as a sum of many simpler Gamma (or Chi-Square) random variables. In each of these cases, a centered and rescaled version of the random variable W=(Y-μy)/σy will be approximately a standard normal distribution.

Saturday, February 9, 2008

Success with Random Variables

So this afternoon, I finished grading a quiz focusing on random variables associated with a sequence of Bernoulli trials. Based on these quizzes, I'm trying to understand what makes a probability course so difficult for students to understand. The other night, I was at a department social and talking with a professor who has taught Math 318 many times. He came right out and stated (without me giving any prompting) that he is always surprised at how students have such a hard time with the course, even though the actual mathematics involved in the course are fairly straight-forward.

In the last entry, I noted that at least part of the challenge is that there are so many different types of functions that appear. But I think that a significant issue comes back to confusion about what random variables represent and how they relate to questions that are posed. Recall that a random variable summarizes some aspect of a random experiment as a single number. (We will soon generalize to the ability to summarize with multiple random variables.) Typical questions in probability focus on the probability of events and the average of certain quantities. Almost always, we answer such questions by identifying an appropriate random variable. We then decide how to characterize events in terms of that random variable, or how to express the quantity being averaged in terms of that random variable.

The examples of random variables related to a sequence of Bernoulli trials provide our first real example of this type of reasoning, and I think this transition is part of why the topic was difficult for many of you. First, we must remember that the random variable is not the same as the random experiment, but simply one of many different ways to summarize an aspect of that experiment. The experiment itself is characterized completely by the sequence of Bernoulli trials, each of which is going to be either a success or a failure. Random variables will be measurement that are related to these successes or failures, and we must choose an appropriate measurement that will allow us to answer the questions of interest.

On our quiz last week, I gave the following scenario: "Apples are packaged in 3-pound bags. Suppose that 4% of the time, the bag weighs less than 3 pounds. You select bags randomly and weigh them in order." In principle, there is an unlimited supply of bags of apples, and we indefinitely select a random bag and weigh it. If we were to weigh enough bags over a long enough time, we would find that the fraction that are underweight appears 4% of the time. However, for any particular number of weighings, we could have found any number of underweight bags. The random experiment is described by an infinite sequence of F's and U's, with an F if the bag was full and a U if the bag was underweight.

So now consider the first question: "What is the probability that you must weigh at least 20 bags before you find an underweight bag?" One method that we could use to do this is to directly understand the event in question. That is, we could create a tree diagram that would eventually give enough information to answer the question. Unfortunately, this tree diagram would be much to cumbersome to answer a question about the first 20 bags weighed: 2^20 different outcomes after 20 weighings. So we try to see if there is a random variable for this random experiment that has enough information to answer the question instead.

There are actually multiple ways to choose a random variable to answer the question. The best random variable is a geometric random variable, which counts the number of Bernoulli trials until we see the first success. Since our question considers when we find the first underweight bag, we define success for our purposes as weighing a bag as underweight. Thus, X counts the number of weighings until we find the first underweight bag. If the first bag is underweight, X=1. But if the first 5 bags are full and the 6th bag is underweight, X=6. Every path on our hypothetical tree diagram is associated with a particular value of the random variable. Since an underweight bag will be chosen with probability p=0.04, we use X~Geometric(p=0.04). Now, having chosen a random variable, we must determine how we answer the question in terms of that random variable. The event of interest, "weigh at least 20 bags before you find an underweight bag," corresponds to an event, "X is greater than or equal to 20". Thus, we wish to compute P[X≥20].

Another way that we could answer the question is to consider that finding an underweight bag for the first time on or after the 20th weighing means that the first 19 bags must have all been full. Consequently, we could answer our question using a Binomial random variable with n=19. So let X count the number of underweight bags out of the first 19 that are weighed. Again, a "success" is finding an underweight bag, so that X~Binomial(n=19,p=0.04). Our event can now be restated as saying that "X is equal to 0". But remember that the X in this paragraph is different from the random variable in the previous paragraph. For this random variable, our question will be answered by computing P[X=0].

The third question on the quiz asked, "What is the probability that you find the 5th underweight bag before you weigh 80 bags?" The best choice for a random variable that will answer our question is to count the number trials until you do find the 5th underweight bag. If we say that a trial is a success if the bag is underweight, then we are counting trials until the 5th success. That is, our random variable X is a negative binomial random variable with r=5 (the number of successes needed to stop counting) and p=0.04 (the success probability). We write this: X~Neg.Binom.(r=5,p=0.04). To answer the question, we must realize that the event of interest is to say that X<80. (We stop counting before we reach 80.) Thus, the answer would be P[X<80].

And of course, as I mentioned earlier, we could actually compute the probability using another type of random variable. For this problem, we must find another way to describe the event of interest. One way to do this is to realize that we find the fifth bag prior to the 80th weighing if there are at least 5 underweight bags by the time we weigh the 79th bag. This can be represented using a binomial random variable. Let our random variable X count the number of underweight bags in the first 79 weighed. Thus, X~Binomial(n=79, p=0.04). The probability we wish to compute is P[X>5].

In summary, to do well in probability, you will need to think probabilistically. Before you can actually compute a quantity (which usually involves fairly straightforward mathematics), you must identify a random variable and understand how to answer the question in terms of that random variable. Then you can use one of the appropriate functions to finally answer the question, whether it is a probability or an expectation.

Monday, February 4, 2008

Random Variables

There seems to be a lot of confusion in relation to random variables. Part of this has to do with students being pressed onto new subjects without necessarily understanding the previous material adequately. Part of this has to do with the plethora of new functions that are introduced in relation to these random variables (e.g., a random variable X is itself a function in the probability space, it has a p.m.f., a m.g.f., and a c.d.f.). And part of this confusion has to do with a variety of ways to compute things: probabilities, expectations, and moments. Before you allow yourself to be engulfed by the onslaught of waves of probability theory crashing down on you, take a breath (of air) and consider the following.

Probabilities most fundamentally describe random experiments. The probability space (or sample space) describes the possible outcomes of the experiments in as much detail as necessary to completely characterize that experiment. A random variable is a single numeric summary of the nature of the experimental outcome. There are typically many different outcomes of the experiment that can lead to the same value for the random variable. Furthermore, a single experiment can provide the information for many different random variables, each of which summarizes a different aspect of the experiment.

For example, suppose our experiment consists of rolling two standard 6-sided dice, one of which is red and the other is green. There are 36 distinct outcomes. We might be interested in the value of the red die, which value we could assign to a random variable R. We might instead be interested in the value of the green die, which value we could assign to a random variable G. Other values of interest might be the largest of the two values, the smallest of the two values, the sum of the values, the difference between the dice, the greatest common factor between the values, etc. The list could go on forever, with each summary value corresponding to a different random variable.

The random variables R and G are independent because the roll of one does not influence the other. They are also identically distributed. The probability mass function (p.m.f.) for the random variable describes the probabilities of individual values for R. We either need a formula for an arbitrary value or a table showing all of the values to describe this function. For this random variable, we have R(x)=1/6 for each of the values in the support S={1,2,3,4,5,6}. The expected value is defined as a sum over all possible values in the support
For a single die roll (R or G), we have E[R] = 1/6(1)+1/6(2)+...+1/6(6) = 3.5.

Let us now consider another random variable, S, which represents the sum of the two values. That is, S=R+G. The support for S is the set {2,3,...,12}. We could compute the p.m.f. for this random variable: f(7)=6/36, f(6)=f(8)=5/36, f(5)=f(9)=4/36, f(4)=f(10)=3/36, f(3)=f(11)=2/36, and f(2)=f(12)=1/36. And we could compute the expected value of the random variable using the definition of mathematical expectation:

E[S]=2(1/36)+3(2/36)+...+10(3/36)+11(2/36)+12(1/36).

However, since S=R+G, we have a lovely little theorem that allows us to use a sum rule:
E[S] = E[R+G] = E[R]+E[G] = 3.5+3.5 = 7.

This is much easier than calculating using the definition.

So, we learn a lesson: if we can express a random variable as a sum of easier random variables, expected value may be more effectively calculated using these individual terms.

To continue, let us consider the distance between the two rolls. That is, we introduce a random variable X = |R-G| to represent the distance between the rolls. The smallest possible value for X is 0, which occurs when the dice are the same. The largest possible value for X is 5, which occurs when one die is 1 and the other is 6. So the support for X is the set S={0,1,2,3,4,5}. In this example, a table for the p.m.f. is much easier, computed based on the number of ways to obtain each distance:
f(0)=6/36, f(1)=10/36, f(2)=8/36, f(3)=6/36, f(4)=4/36, f(5)=2/36

For this problem, the random variable X is not easily expressed as a sum. So we must compute expected value using the original definition:

E[X]=0(6/36)+1(10/36)+2(8/36)+3(6/36)+4(4/36)+5(2/36)=70/36.


In summary, a random variable is a single number that summarizes some aspect of a random experiment. The p.m.f. of the random variable gives probabilities of individual outcomes. The expected value (or mathematical expectation) computes the average of a random quantity weighted by appropriate probabilities of those values.