Walton's JMU Math Blog: March 2008

The central limit theorem allows to use a standard normal random variable as an approximating proxy for a properly centered and rescaled sample mean.

First, for any random variable Y with expected value E[Y]=μ_y and variance Var(Y)=σ_y², if we center and rescale as W=(Y-μ_y)/&sigma_y;, then W will always have expected value E[W]=0 and variance Var(W)=1. But it is not the case that W behaves like a standard normal distribution unless Y is somehow special. However, in special cases, when Y can be represented in terms of a sum of many independent and identically distributed random variables, then the central limit theorem can help us.

In particular, if Y is the sample mean of a random sample X₁, X₂, ..., X_n, then Y=(X₁+...+X_n)/n. Suppose that each X_i has expected value E[X_i]=μ_x and variance Var(X_i)=σ_x². Then we know that μ_y=μ_x and σ_y²=σ_x²/n. Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μ_y)/σ_y = (Y-μ_x)/(σ_x/√n). Because Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

As a second example, suppose that Y is the sum of a random sample X₁, X₂, ..., X_n, with Y=X₁+...+X_n. Suppose that each X_i has expected value E[X_i]=μ_x and variance Var(X_i)=σ_x². Then we know that μ_y=nμ_x and σ_y²=nσ_x². Our centered and rescaled version of Y can be expressed in terms of the parameters for X: W=(Y-μ_y)/σ_y = (Y-nμ_x)/((√n)σ_x). Again, since Y can be defined in terms of a constant times the sum of i.i.d. random variables, W has an approximately standard normal distribution.

Quite a few of our known distributions can be described as a sum of i.i.d. random variables. The most famous is the binomial distribution. Suppose that Y has a binomial distribution with n trials and probability p. Then we can think of Y as the sum of n i.i.d. Bernoulli random variables X₁,...,X_n, each with parameter p. Then μ_x=E[X_i]=p and σ_x²=p(1-p), leading to μ_y=np and σ_y²=np(1-p). Consequently, W=(X-np)/√(np(1-p)) has an approximately normal distribution. In this example, we usually find that we need np≥5 and n(1-p)≥5 for the approximation to be good.

Other random variables that can be represented as a sum of simpler i.i.d. random variables. The Poisson distribution for large value λ can be rewritten as the sum of many smaller Poisson RVs with small values of λ (X₁,...,X_n each Poisson with rate λ/n). The Gamma distribution (including Chi-Square) can similarly be written as a sum of many simpler Gamma (or Chi-Square) random variables. In each of these cases, a centered and rescaled version of the random variable W=(Y-μ_y)/σ_y will be approximately a standard normal distribution.

Walton's JMU Math Blog

Monday, March 31, 2008

Central Limit Theorem

Blog Archive

About Me