Thursday, January 3, 2013

Oh yeah! I've got infinity plus one!

Earlier this week, my brother Chris sent me a Facebook question posed by my preschool-aged nephew:  "What is infinity minus infinity?  What is zero minus infinity?"  I'm sure most of us at some point engaged in the one-upmanship game of making bigger numbers than our sibling or friend.

Chris: 20

Brian: 21

Chris: 100

Brian: 101

Chris: 1000

Brian: 1001

And then your brother makes the leap!

Chris: Infinity!

Brian: Infinity + 1!

Chris: That's just infinity.  There's nothing bigger than infinity.  I win!

No fair!  What's up with this?  Up to the point where we leap into the infinite, we are dealing with numbers.  As it happens, infinity is not a number.  Some people like to say, "Infinity is not a number, it's a concept."  But this isn't very helpful at all.  What good does it do to say it's a concept?  What is a concept anyway?  Or for that matter, what is a number?

Another of my Facebook friends relatively recently introduced me to a TED video that explores the cardinality idea of infinity.  That is, cardinality is about sizes of sets, and natural numbers (positive integers) or counting numbers are exactly the type of numbers that measure cardinality.  The idea of infinity in relation to cardinality corresponds to the idea that you have an unending number of elements in a set.

Bizarre things happen in the cardinality sense of infinity.  For example, if you take all of the positive integers (1, 2, 3, 4, ...) and double each number (2, 4, 6, 8, ...), we still have the same number of elements in the set (since we just manipulated each object).  But our new set happens to be a subset of the original.  When dealing with finite sets, a proper subset always has fewer objects than the original set.  But we just saw an example where a proper subset has an equal number of objects (infinitely many).  So here, the phrase "number of objects" does not represent an actual number, but represents the concept of cardinality.

Alternatively, we could have started with all of the positive integers (1, 2, 3, 4, ...) and then just deleted every other number in the list.  This also gives us (2, 4, 6, 8, ...) since the odd numbers were all removed.  This is one way of showing that an infinite set taken away from an infinite set can still be infinite.  This is an example where ∞ - ∞ = ∞.  On the other hand, if we start with (1, 2, 3, 4, ...) and then take away the infinite collection of numbers (11, 12, 13, 14, ....), we are left with (1, 2, 3, 4, 5, 6, 7, 8, 9, 10).  This would be an example where ∞ - ∞ = 10.  In fact, by choosing how many numbers we want to leave and then just deleting all of the rest, it is easy to create examples where ∞ - ∞ = n for any integer represented by n.

The point here is that a cardinality interpretation of subtraction (removing elements from a set, like taking candy pieces out of a pile) reveals that the cardinality of ∞ (not a number) does not follow ordinary arithmetic rules.  The infinite does that; it breaks the ordinary rules we are comfortable with when dealing with finite things.

In mathematics, we say ∞ - ∞ is indeterminate because the result actually depends on how the subtraction takes place.

A cardinality interpretation of 0 - ∞ does not actually make sense.  Interpreting 0 - ∞ first requires an extension of the idea of numbers to negative numbers.  For example, my children want to tell me that 2 - 5 = 0 because if you start with 2 candies and try to take 5, you don't have any left.  But at first they don't realize that there are 3 candies that you never got to take.

We could introduce the idea of borrowing (loans).  Suppose that I have 2 candies in my bowl and my daughter wants to eat 5 candies.  If I want to give her 5 candies, then I can give her my 2 candies but then I'll need to get 3 more candies from someone else.  This puts me in debt for 3 candies (-3) which I might represent by little paper IOUs.  I have 3 IOUs.  I can pay off the IOUs when I obtain candies.  For each candy I receive, I pay off an IOU.

This is the idea of addition extended to all integers.  -5 + 2 corresponds to have 5 IOUs and 2 candies.  The 2 candies pay off 2 IOUs, leaving me the same as if I had only 3 IOUs to begin with.  So -5 + 2 = -3.  Subtraction is normally thought of as actually taking candies away.  That is 5 - 3 corresponds to having 5 candies and taking 3 away, leaving only 2.  The extended idea of subtraction is to think of having 5 candies and using 3 IOUs: 5 - 3 = 5 + -3.  Taking away the candies is equivalent to redeeming IOUs.

That is, once we start dealing with negative numbers, subtraction is really about adding negatives (redeeming IOUs).  The value -∞ simply means that we have an infinite number (there's that cardinality idea again) of IOUs.  So 0 - ∞ is really the idea 0 + -∞, which means we start with a pile of 0 candies and an infinite number of IOUs.  Since we can't redeem any of the IOUs, we still have infinitely many.  That is, 0 - ∞ = -∞.  By the same argument, 5 - ∞ = -∞ and ∞ - 5 = ∞.

In addition, this gives us a way of extending our earlier idea of subtraction for ∞ - ∞ to end with any possible answer from -∞ to ∞.  The extended cardinality approach to ∞ - ∞ would mean that we have infinitely many candies and infinitely many IOUs.  If we put the candies and IOUs each in some order (an interesting philosophical question is if this is always possible), then we can make a choice on how we redeem our candies.

We might redeem every IOU but occasionally (or regularly) skip some of the candies.  For example, the first IOU could take the 1st candy, the 2nd IOU takes the 3rd candy, the 3rd IOU takes the 5th candy, and so on, leaving candies 2, 4, 6, ....  This would correspond to ∞ - ∞ = ∞.  Or we could redeem only some of the IOUs but use all of the candies.  For example, the 1st candy redeems the 1st IOU, the second candy redeems the 3rd IOU, and so on, leaving infinitely many IOUs unredeemed.   This corresponds to ∞ -  ∞ = -∞.  Or we could leave the first 12 IOUs unredeemed and then redeem each subsequent IOU by the subsequent candies, corresponding to an example of ∞ - ∞ = -12.

It is essential to see that when we deal with arithmetic involving ∞ or -∞, we are not dealing with numbers.  For cardinality, we are actually dealing with correspondences between sets.  ∞ - ∞ does not adequately describe the correspondence.  But 12 - 5 is adequate because every finite correspondence results in the same final answer.  But for infinite correspondences, the nature of the correspondence itself makes a difference.  Indeterminate really is indeterminate. 

Edit: I had a colleague point out to me that there is a second sense of infinite using ordinal numbers. In this setting, it is possible to talk about "Infinity plus one" as having meaning that is actually distinct from "Infinity".  For now, you might want to look at the wikipedia page, but I'm working on my own response as well. (October 8, 2013)

Tuesday, November 20, 2012

Derivatives, Velocity, and Acceleration

Calculus can be viewed as the study of rates of change of quantities.  The most familiar rate of change in our ordinary experiences is velocity as the rate of change of position.  As we ride our skateboards, bicycles, and cars, we understand that high velocities mean our position is changing quite rapidly; and we understand that when our velocity is zero, we are standing still.

So imagine the experience of a perfect rocket-car that experiences no friction and has no brakes.  The only way to change its velocity is with rocket blasters that are installed on either end of the vehicle.  Consider the following trajectory, illustrated as an animation, and repeating in a loop.  A timer (16 seconds) is also shown to provide a measurable sense of time.

The following table describes when and which direction the rockets are firing.

Time Interval Direction of Rocket Magnitude
[0,1) None 0
(1,3) Right Moderate
(3,5) None 0
(5,7) Left Large
(7,8) None 0
(8,12) Right Small
(12,16] None 0

In addition, the rocket-car is driving along a track which has positions marked so that we can think of the position as a variable (in meters).  That is, we can think of examining the relation between the variables of time and position.  The following table provides the position of the rocket-car as recorded at each second.

time (s) 0 1 2 3 4 5 6 7
position (m)   0.0   0.0   1.0   4.0   8.0  12.0  14.0  12.0 
time (s) 8 9 10 11 12 13 14 15
position (m)   8.0   4.5   2.0   0.5   0.0   0.0   0.0   0.0 

A table is only useful for a coarse overview of the relation between the variables.  For example, we can see that between t=2 s and t=3 s, the position went from x=1.0 m to x=4.0 m.  We can compute an average velocity over this time interval:

So the average velocity on this interval was 3.0 m/s.  However, this was during one of the intervals the rocket was firing.  It would have been going slower at the beginning of the interval and faster at the end.  The table does not give enough information for us to estimate the actual velocities.

Another representation of the relation between time and position is with a graph.

The derivative defines a new variable, dx/dt.  Although this looks like a fraction, we should really think of the entire symbol as the name of our new variable, the derivative.  This variable measures the rate of change of position x with respect to time t.  On the graph, this corresponds to the slope of the graph at each point.

For example, if I were to consider the point at t=2 s and x=1 m, we might draw the tangent line and measure the slope.  This slope is the derivative, which for this point corresponds to dx/dt=2.0 m/s.  Notice that the derivative is a velocity, which is precisely what the derivative measures for position with respect to time.  So we will use the variables v and dx/dt interchangeably.

We could compute the derivative for every point of the original relation.  This new variable could be added to our table.

time (s) 0 1 2 3 4 5 6 7
position (m)   0.0   0.0   1.0   4.0   8.0  12.0  14.0  12.0 
velocity (m/s)   0.0   0.0   2.0   4.0   4.0  4.0  0.0  -4.0 
time (s) 8 9 10 11 12 13 14 15
position (m)   8.0   4.5   2.0   0.5   0.0   0.0   0.0   0.0 
velocity (m/s)   -4.0   -3.0   -2.0   -1.0   0.0   0.0   0.0   0.0 

But it would be better, just as with our original relation, to consider the relation as a graph.  The figure below illustrates both graphs one above the other.

Notice that the velocity graph itself also has a slope at (nearly) every point.  The slope of a velocity graph is also a rate of change, measuring how the rate of change of velocity (m/s) with respect to time (s).  Consequently, the derivative dv/dt, which is called acceleration has units (m/s)/s, or more directly m/s2. (If velocity was measured in miles per hour, then acceleration might be measured as mph/s.)  We can use the variable name a=dv/dt.

Because our velocity is defined as piecewise linear, the acceleration will be piecewise constant.  On intervals where the velocity was constant, the acceleration will be zero.

Notice that the acceleration is directly related to our original discussion of the rocket-blasters.  When the rockets are blasting to the right, the acceleration is positive; when the rockets are blasting to the left, the acceleration is negative.  In fact, this is exactly the idea behind Newton's second law of mechanics, F=ma.  The rocket's thrust corresponds to the force F.  Newton's law simply states that the acceleration is proportional to the force.  The mass, which measures inertia, provides the proportionality constant. If we had the exact same rockets and the car had twice the mass, then the acceleration would be cut in half.

Because our rocket-car does not have brakes, the only way to slow down is to turn on the rockets in the opposite direction of motion.  In the language of derivatives, the acceleration must have the opposite sign from the velocity.  If the acceleration is the same sign as the velocity, then the effect of acceleration is an increase in the speed (the magnitude of velocity).

We close this reading by considering the effect of acceleration on the graph of position.  Slowing down corresponds to making the slope closer to horizontal.  Speeding up corresponds to making the slope steeper.  So, let us look at the original graph of position as a function of time, but marking the graph with different colors, depending on how the car is accelerating.
When the velocity and acceleration are opposite, I have marked the graph in red.  Notice that this is when the graph is becoming closer to horizontal (left-to-right).  When the velocity and acceleration are the same direction, I have marked the graph in green.  Notice that this is when the graph is becoming steeper (left-to-right).  When there is no acceleration, I have marked the graph in purple (straight lines).

To visualize this relative to the original animation, I have color-coded the timer and have placed a colored-flag on the car.  Try to connect the information in the graph to the visualization of the moving rocket-car.

Concavity is the word that describes how a graph bends.  When the second derivative (acceleration) is positive, the graph will be concave up.  On our graph, this corresponds to the first green segment, ∈ (1,3), and the last red segment,  ∈ (8,12).  When the second derivative is negative, the graph will be concave down.  This corresponds to the middle red and green segments,  ∈ (5,7).

Tuesday, November 13, 2012

Great Circles and Non-Euclidean Geometry

A few weeks ago, a few of my children were playing on the computer with Google Earth.  They wanted to see the aerial photographs of some our previous residences.  Then one of them wanted to measure exactly how far away we moved, when we moved from Mountlake Terrace, Washington, to Harrisonburg, Virginia.  So he drew a path connecting our previous address to our current address.

What jumped out at me was not the distance, but the heading—90.4 degrees, or almost exactly due east.  I was puzzled.  Wasn't Virginia at a lower latitude than Washington? So how could we be exactly east of Virginia.  Then I realized that Google Earth is calculated distance using great circles.

Here is a picture of the path.  Notice how the path bends.

A great circle is a path on the surface of a sphere such that the center of the sphere is the center of the circle.  My first instinct when I read the heading was 90 degrees was that the path followed a line of constant latitude.  That is, I would follow the compass heading of "East", keeping the north pole always exactly to my left.  The image (coming from Wikimedia Commons) below illustrates these lines as being "parallel" to the equator.  It also illustrates lines of constant longitude, which go from pole to pole.

The equator and lines of constant longitude are examples of great circles.  But the lines of constant latitude (other than the equator) are not, because their centers are shifted away from the center of the sphere.

How can you get other great circles?  One way is to choose any point on the surface of the sphere.  Then take a line and go through the center of the earth to the diametrically opposite point.  (Etymology: The word "diametrically" comes from the word "diameter;" great circles always have the same diameter as the sphere on which they are constructed.)  These two points will be on longitude lines exactly 180 degrees apart, which together form a great circle.  You can then hold your two points constant and rotate that great circle around the axis you created by joining the two points.

Trivia (also known as a Cool Math Fact):  Any great circle containing a point must include its diametrically opposite point.

Non-Euclidean Geometry

Now, let's talk about distance.  If you take any two points on the surface of the sphere, we want to find the shortest path between the two points.  In the geometry of the Greek mathematician and geometer Euclid, the shortest distance between two points is a line.  But what happens when we are constrained to paths on the surface of a sphere?

Imagine that you actually want to measure distance.  You might do this by taking a piece of string (on the earth, this would be a very long string) and lay it down along your path.  On this string, you could mark units of measure (like a measuring tape). When you arrive at your destination, you can just read off the distance of your path.  Wildly wandering paths would clearly involve a longer distance.

Given your path, you might want to see if you can improve it.  You could do this by trying to pull your string tighter, and sliding it around on the surface to see if you can use any less string to reach your point.  We have to imagine that the surface offers no friction, so that the string naturally slides toward a better path if it is available.  A path that can not be improved corresponds to an optimal path.

If we did our experiment of minimal paths on a flat table, then the optimal paths would follow Euclid's prediction of what we think of as straight lines.  But on a sphere, optimal paths always follow great circles.   The idea of non-Euclidean geometry is to consider lines not by our usual sense of straightness (whatever we think that means) but in terms of minimal paths.

In geometry, points and lines are basic elements, meaning they can not be defined from more elementary objects.  Euclid's geometry, which is our classical geometry based on what we think of as straight lines, is based on the premise that given any two points, you can use a straight edge (ruler) to draw a straight line.  And given any finite line segment, you can extend this line indefinitely.  But where does this infinite straight edge come from?

In non-Euclidean geometry, all we ever get are short straight edges.  We can only extend a line a little bit at a time, although we mathematically assume that we can do this perfectly, without any errors in aligning the straight edge to the rest of the line.

In Euclidean geometry, using a short straight edge repeatedly is identical to using an infinite straight edge.  But if we do our geometry on a curved surface (instead of a perfectly flat infinitely large plane), then using our short straight edge follows a path that locally looks straight.  But it has to follow the curving of the surface so that it does not look anything like Euclid's sense of a straight line.  On the surface of a sphere, a straight line that is formed by extending a short line segment with our short straight edge over and over again will be a great circle.

Where is this used?

Great circles are used to plan shortest paths for airline flights.  Talking about distance "as a bird flies" corresponds to paths along a great circle.

But non-Euclidean geometry has an even more fundamental role in understanding physics and light.  One of the basic principles of how light works is that it follows a path of least time (not distance).  The idea of refraction (why a stick looks like it bends when immersed in water) is a direct consequence of the fact that light travels through different substances at different speeds (slower through water than through air).  The amount of bending of the light's path can be exactly calculated by finding the path over which light takes the least time to traverse.

Curiously, sometimes there are multiple paths over which the light minimizes its time.  For example, the speed of light is faster in less dense air.  If air is heated, it expands and decreases its density.  So there are times when light can find a path with a greater physical distance but less time by choosing a path through warmer, less dense air.  Ripples that form a mirage on a hot road in front of you are evidence of these multiple paths.

Thinking about how light travels actually led to Einstein's development of relativity.  His theory of general relativity considers how light would need to travel if it was in a box that was accelerating.  By imagining that such a box is indistinguishable from a box that is subject to gravity, Einstein realized that light's path needs to bend as it passes through a gravitational field.

But the best way to think about these paths is by considering that light is actually always traveling in a straight line, just that the space in which it travels has a constantly changing sense of what is straight.  Further, Einstein realized that space and time are inseparably connected.  That is, light is moving in what is called the space–time continuum.  Gravity distorts this space–time continuum, and the way this is studied is through non-Euclidean geometry.

By the way, the answer to our original question: 2,257 miles.

Wednesday, September 14, 2011

Proof by Induction and Summations

The two previous blog entries introduced the idea of the Principle of Mathematical Induction followed by a discussion of a typical example of a proof by induction.  Be sure you read those two entries before you look at this one.  This blog post extends these ideas by talking about how proof by induction applies to summations.

(I apologize right now about the formatting of summation notation, or sigma notation.  I do not know how to get the math to look like math in a blog setting.)

Recall that the second condition of the PMI is that an arbitrary statement S(n) in the chain of statements being proved will guarantee that the next statement S(n+1) in the chain is also true.  In a proof by induction, this step always involves using a recursive relation between something in the sentence S(n) with a corresponding object in the sentence S(n+1).

For summations, this recursive relation is always that a summation for statement S(n+1) exactly corresponds to the summation appearing in statement S(n) with some additional term(s).

For example, think back to our motivating example of a chain of statements:
S(1):    1 = 1(2)/2
S(2):    1+2 = 2(3)/2
S(3):    1+2+3 = 3(4)/2
S(4):    1+2+3+4 = 4(5)/2
S(5):    1+2+3+4+5 = 5(6)/2
Notice that the sum appearing on the left hand side of any given sentence appears in the sum on the very next sentence, but one more term has been added.  Using parentheses to emphasize where the sum from the previous sentence appears in the new sentence, here is the same chain:

S(1):    1 = 1(2)/2
S(2):    (1)+2 = 2(3)/2
S(3):    (1+2)+3 = 3(4)/2
S(4):    (1+2+3)+4 = 4(5)/2
S(5):    (1+2+3+4)+5 = 5(6)/2

Summation (Sigma) notation gives us a handy way to describe sequences of sums like we see in the chain above.  The terms of the sum follow a simple pattern.  In this example, the pattern is the sequence of terms (1, 2, 3, 4, 5, ...).  This sequence can be explicitly described as ak = k for k=1, 2, 3, ....  In the symbolism (notation) of summation notation, we write:

Σ1k=1(k) = 1
Σ2k=1(k)  = 1+2
Σ3k=1(k) = 1+2+3
Σnk=1(k) = 1+2+3+...+n
(Notice that the formula in the parentheses (k) is the explicit formula of the sequence of terms in terms of the index which is listed at the bottom of Σ along with the first index of the sequence used in the sum.  The index of the last term in the sum appears at the top of Σ.)

The pattern that we described above to illustrate the recursive relation between the sums of consecutive sentences in the chain of statements we wish to prove can also be written in terms of summation notation:
Σn+1k=1(k) = Σnk=1(k) + (n+1)
When writing a proof by induction, we use this recursive relation.

Example:  Prove that Σnk=1(k) = n(n+1)/2  for n=1, 2, 3, ...

In the proof, we will write S(n) to represent the sentence: Σnk=1(k) = n(n+1)/2.

(1) We first need to think about the sentence where n is replaced by 1.  That is, we need to prove that Σ1k=1(k) = 1(1+1)/2.
Σ1k=1(k) = 1   (Interpretation of summation notation)
(We are creating a true statement involving the summation symbol that appears on the left side of the equation we are proving. The sequence of terms is 1, 2, 3, 4, 5, ..., and the sum starts and ends with the 1st term.)
1(1+1)/2 = 1(2)/2 = 1   (Algebra)
(We now create an equation involving the formula on the right side of the equation we are proving, and we just showed it has the same value as the summation symbol.)

So Σ1k=1(k) = 1(1+1)/2.
(We obtained evidence above that the summation and the formula both represented the same value, so we conclude that they are equal.)

(2) We now need to show that S(n) implies S(n+1) for n=1, 2, 3, ....  We will assume S(n), i.e. Σnk=1(k) = n(n+1)/2.  Using the recursion connecting the sums, we will show S(n+1), i.e., Σn+1k=1(k) = (n+1)((n+1)+1)/2.

Assume    Σnk=1(k) = n(n+1)/2    for some n=1, 2, 3, ...
Σn+1k=1(k) = Σnk=1(k) + (n+1).     (Recursive relation on summation of terms ak = k)
Σn+1k=1(k) = n(n+1)/2 + (n+1)     (Substitution: summation replaced by formula)
Σn+1k=1(k) = n(n+1)/2 + 2(n+1)/2     (Find common denominator)
Σn+1k=1(k) = (n2+n+2n+2)/2     (Distribution and adding fractions)
Σn+1k=1(k) = (n2+3n+2)/2     (Combining terms)
(Notice that each equation was based on the previous equation.  The key step was when we replaced Σnk=1(k) by the formula n(n+1)/2 that was provided by the assumed hypothesis.  At this stage of the proof, we have a formula representing the value of the summation symbol with n+1 that is the left hand side of the equation in the sentence S(n+1).  We now need to check the formula that appears in the right hand side of our sentence.)
(n+1)((n+1)+1)/2 = (n+1)(n+2)/2    (Arithmetic: 1+1=2)
(n+1)((n+1)+1)/2 = [n(n+2)+1(n+2)]/2     (Distributive law)
(n+1)((n+1)+1)/2 = [n2+2n + n+2]/2     (Distributive law again.)
(n+1)((n+1)+1)/2 = [n2+3n+2]/2     (Combining terms)
(Notice that each equation was based on the previous equation.  Notice that the second and third steps (distributive law) are the formal steps in the idea commonly taught as FOIL when multiplying to binomial expressions together.  The key is that we took the formula with n replaced by n+1 of the right hand side and discovered that it equals the same formula that we found for the summation above.)
So we have Σn+1k=1(k) = (n+1)((n+1)+1)/2.
Consequently, S(n) implies S(n+1) for n=1, 2, 3, ....

By the PMI (since S(1) is true and S(n) implies S(n+1)), we have S(n) is true for n=1, 2, 3, ...

Mathematical Induction (part 2)

My previous blog post introduced the basic idea of what mathematical induction is about.  This post focuses more on the mechanics of writing a proof.

A proof is the mathematical form of argument or persuasion.  A proof consists of a sequence of logical statements, each of which is shown to be a true sentence based only on information previous to that sentence.  Examples of sentences that might appear in a proof are equations or inequalities for which there is clear reason that it is true, and never based on what we hope is true or will later show is true.

For example, suppose I needed to prove (x+1)2=x2+2x+1.  In a proof, I can not write down this equation as the first statement because it is not something I know (yet).  Instead, I can write down equations known to be true based on basic principles:
  • (x+1)2 = (x+1)(x+1)     (meaning of power 2)
  • (x+1)(x+1) = x(x+1) + 1(x+1)   (distributive law)
  • x(x+1) + 1(x+1) = x2+x + x+1   (distributive law again)
  • x2+x + x+1 = x2+2x+1   (collecting like terms)
Notice that each of the sentences is a statement of equality.  Although they to suggest that the "=" sign is being used in a computational sense (i.e., it looks like it is being used to say apply a rule), we really are seeing these sentences as declaring that the result of applying a rule demonstrates that the sentence is actually true.  (This is a subtle distinction that you need to fight your mind until it sinks in.)  Technically, our proof is not yet complete.  Each sentence on its own is a complete, true sentence.  However, we need to end by stating that the sentence we were trying to prove is actually true:
  • (x+1)2=x2+2x+1   (equivalence of equal quantities, or substitution)
Now, you should read the above paragraphs as illustrating the ideas of a proof in that it illustrates how sentences (equations) are listed as statements that are demonstrated to be true.  But so far, we have not dealt with the idea of implication.

Recall that the Principle of Mathematical Induction (PMI) involves verifying the two conditions:
  1. Show that the first statement in the chain, which we call S(1), is true.
  2. Show that if any single statement in the chain, which we call S(n), is true, then this implies the next statement, which will be S(n+1), is also true.
So the structure of every proof by induction involves two subproofs (showing S(1) is true; showing S(n) implies S(n+1)) followed by an application of the PMI.  The subproofs are the method we verify that the two conditions of the PMI have been satisfied.

Here is what to look for in the subproofs.
  1. S(1) is almost always very easy to show.  However, you still need to be clear that you follow the pattern of a proof described above.
  2. Showing S(n) implies S(n+1) will consist of assuming that S(n) is true for some n (don't forget that this is a specific but unspecified value, so you must leave n or k, depending on the label being used, as a symbol and not an actual number).  Then the proof will almost always rely on some type of recursive relation between a quantity involved in the statement S(n) and a similar quantity in the statement S(n+1).
  3. After the subproofs are complete, you must invoke the PMI and then declare that the statements are true for every n=1, 2, 3, ....
Example:  Given a sequence defined by x1=3 and the recursive relation xk+1=xk+2 for k=1, 2, 3, ..., prove that xk=1+2k for k=1, 2, 3, ....

Notice what the chain of statements we are trying to prove will be.   The sentence S(k) is the statement xk=1+2k, where xk is the value of the sequence defined recursively, and 1+2k is just a formula involving k.  It is very important to remember that the "=" sign in this equation is not defining the value of the sequence, but is simply stating that the two quantities (the sequence value and the formula value) happen to always be equal.

Everything in italics is not part of the proof.

(1) We first write a subproof that S(1) is true:  x1=1+2(1).  To do this, we need to create a sequence of equations that we know are true based on the symbols themselves, and not the equation above.

x1=3   (Given)
(We looked at the given information and saw this was provided.)
1+2(1) = 1+2 = 3   (Arithmetic)
(We needed an equation involving 1+2(1) so we wrote down an equation that was based on the rules of evaluating this formula.)
So x1=1+2(1).   (Equivalence) 
(Earlier, we showed x1=3 and then we showed 1+2(1)=3, so this means the two quantities are equal.  This ends the subproof since we just finished writing the statement corresponding to S(1) being true.)

(2) We next write a subproof that S(k) implies S(k+1) for k=1,2,3,...  To do this, we will start by assuming S(k) is true for some unspecified k.  Using the recursion to compute xk+1 we will demonstrate that S(k+1) is also true.  Note: S(k+1) is the statement xk+1=1+2(k+1).

Assume xk=1+2k is true for some k=1, 2, 3, ....
xk+1=xk+2   (Given recursive definition of sequence)
(The statement we are trying to prove involves the symbol xk+1 so we need to create equations based on that symbol.)
xk+1=(1+2k)+2   (Substitution:  xk=1+2k from assumed hypothesis)
(This is the key step:  we use the recursive connection between xk+1 and xk to establish a formula for xk+1.)
xk+1=3+2k    (Algebra)   
(We now have a simple formula for xk+1 and now we turn our attention to the other half of the statement S(k+1), namely the formula 1+2(k+1).)
1+2(k+1) = 1 + 2k + 2   (Distributive law)
(We are creating a true equation involving the formula by considering the results of applying mathematical laws.)
1+2(k+1) = 3+2k    (Algebra from above: 1+2=3)  
(Our target should always be that the formula with k+1 in place of k will match whatever we found by the recursion to compute a value for xk+1.)
So xk+1=1+2(k+1)    (Equivalence or Substitution)
That is, S(k) implies S(k+1) for k=1, 2, 3,....
(This ends the second subproof because we just finished writing the statement we were trying to prove in this part.  We are now ready to invoke the PMI.)

Since S(1) is true and S(k) implies S(k+1) for k=1, 2, 3, ..., the Principle of Mathematical Induction guarantees S(k) is true for every k=1, 2, 3, ....  That is, xk=1+2k for k=1, 2, 3, ...

    Mathematical Induction

    I actually posted on this very topic a few years ago.  However, I have refined how I think about doing proofs by mathematical induction.  And so I am writing one more time.

    First, you might find it interesting to look at the Wikipedia article on mathematical induction.

    The Principle of Mathematical Induction (PMI) is an axiom that describes the set of natural numbers, which is N = {1, 2, 3, 4, ...}.  From one point of view, the PMI says that if S is a set that (1) contains the number 1 and (2) guarantees that whenever any number n is in the set, then n+1 is also in the set, then the set S contains all of N.  From the perspective of proofs, however, we are really interested in showing that an infinite chain of logical statements is true.

    Before we proceed with the idea of the PMI, let me be clear about a logical statement.  A logical statement is a well-constructed sentence that is definitively true or false. Two well-known but easily misunderstood examples of logical statements are equations and inequalities.  An equation "A=B" is a logical statement that declares two quantities (A and B) are equal (have the same value).  An inequality "Afalse, but the statement itself is logically complete.  (Here logical does not mean `makes sense' but it means `can be decided between True or False'.)

    Related to this issue is a common misunderstanding by students that "=" is like an operation that means "has the value" or "find the answer" as it is often used on a calculator.  This is especially important in a proof, where each statement (usually an equation) must be clearly true rather than a statement of what you hope or what you are trying to calculate.

    Okay, back to the Principle of Mathematical Induction.  This principle is about dealing with an infinite chain of logical sentences.  For example, consider the following chain of statements.  S(1) is the label for the first sentence, S(2) is the label for the second sentence, and so on:

    S(1):     1 = 1(2)/2
    S(2):     1+2 = 2(3)/2
    S(3):     1+2+3 = 3(4)/2
    S(4):     1+2+3+4 = 4(5)/2
    S(5):     1+2+3+4+5 = 5(6)/2
    S(6):     1+2+3+4+5+6 = 6(7)/2
    (the pattern continues so that if n is any number n=1,2,3,..., we have):
    S(n):     1+2+3+...+n = n(n+1)/2

    On the left of each sentence is a summation.  On the right of each sentence is a simple formula involving n.  Using summation notation, we would have written this sentence:
    S(n):    Σnk=1 k = n(n+1)/2

    The pattern described by S(n) looks like a single sentence, but it is important to remember that it is describing the entire infinitely long chain of logical sentences.  If you add up the values on the left side of any single sentence and compare it to the value of the formula on the right, you will see that the answers are the same.  But this only verifies the formulas that you actually check.

    The Principle of Mathematical Induction is a tool to prove that the entire chain of sentences are all true.  The PMI is much like an infinite chain of dominoes (see the earlier linked wikipedia article).  To knock over a chain, if you knock them down one at a time, you'll never finish.  But if you show that knocking any single domino down will guarantee the next domino also falls and you show that you can knock down the first domino, then this is enough to guarantee that the entire chain will fall down.

    So here is the PMI:
    Suppose S(n), n=1, 2, 3, ..., is a chain of logical statements and that (1) S(1) is true and (2) S(n) implies S(n+1) for any n=1, 2, 3, ..., then S(n) is true for every n=1, 2, 3, ....
    Notice that there are two conditions to use PMI:
    1. We must verify that S(1) (the first sentence) is true.
    2. We must show that the sentence S(n+1) is true whenever the previous sentence S(n) is true.
    A proof by induction is the process whereby we verify these two conditions and then apply the PMI to conclude that the entire chain is true.

    To learn more about writing the proofs and an example with commentary, please continue by reading the next blog post.

    Tuesday, March 8, 2011

    How We Learn Mathematics

    I was reading the following paper: D. Breidenbach, E. Dubinsky, J. Hawks, and D. Nichols, "Development of the Process Conception of Function," Educational Studies in Mathematics, 23: 247-285, 1992.

    Quote Dubinsky (1989): "A person's mathematical knowledge is her or his tendency to respond to certain kinds of perceived problem situations by constructing, reconstructing and organizing mental processes and objects to use in dealing with the situations."

    "Applying this point of view to mathematics (or any other subject) consists of determining the nature of the specific processes and objects that are constructed and how they are organized when one studies mathematics"

    Ways of thinking about functions:
    • prefunction - does not understand any real ways of using function concepts
    • action - repeatable mental or physical manipulation (e.g., plug in numbers and calculate); static; one step at a time
    • process - think of function as a single dynamic transformation
    I then found another article: A. Sfard and L. Linchevski, "The Gains and the Pitfalls of Reification: The Case of Algebra," Educational Studies in Mathematics, 26 (2/3), 191-228, 1994 [Learning Mathematics: Constructivist and Interactionist Theories of Mathematical Development]

    This article proceeds with the view that in mathematics, there is a duality in mathematical constructs being a process or an object. That is, conceive of things operationally (process) or structurally (object). Historical examples include the expansion of number systems: positive to negative (operational: subtraction as adding a negative to structural: negative numbers as objects), and real to complex (i=sqrt(-1) as an operational convenience to an actual object)

    An included reference suggests finding another article: Kieran, C.: 1992, 'The learning and teaching of school algebra', in D. A. Grouws (ed.), The Handbook of Research on Mathematics Teaching and Learning, Macmillan, New York, pp. 390-419. I'll have to see if I can find this one, as it is cited for the sentence, "[reification] was also used to introduce some order into the quickly growing bulk of findings about algebraic thinking."

    Interesting phrase: "the ability to grasp the structural aspect is not easy to achieve" and "those crucial junctions in the development of mathematics where a transition from one level to another takes place are the most problematic."

    Another interesting way to think about how mathematics is organized: (1) Logical, or the way it fits together; (2) Historical, or the way in which it was developed; and (3) Cognitive, or the processes in which people learn.

    Modes of Algebra
    1.1) Algebra as Generalized Arithmetic: The Operational Phase
    -- solve for the unknown, but not using symbols (grade school algebraic thinking)
    -- rhetoric algebra
    -- principally reversing processes
    1.2) Algebra as Generalized Arithmetic: The Structural Phase
    1.2.1) algebra of a fixed value (unknown)
    -- Notational convenience, but treat variable as a fixed value
    :::: becomes a mental challenge to think of formula as both a process and result
    :::: example given: 2+3 represents process, 5 represents result. But x+3 represents both, no separate "result"
    :::: compare to the challenges of new number types required to think about division, subtraction, and extracting square roots
    ** Nice comment: "Once we manage to overcome this difficulty, it is quickly forgotten. ... Our eyes are easily blinded by habit and by our own ontological beliefs. Nevertheless, much evidence for the difficulty of reification may also be found in today's classroom, provided those who listen to the students are open-minded enough to grasp the ontological gap between themselves and the less experienced learners."
    1.2.2) Functional algebra (of a variable)
    :::: View formula as object
    :::: Parameters represented as symbols not numbers.
    2) Abstract Algebra

    Give examples of interview questions. Students at early stages of thinking think about formulas as recipes for computations (process) but do not perceive them as valid objects. "The equality sign is interpreted as a 'do something signal' (Behr et al 1976; Kieran 1981)"

    Here's something I see all the time in calculus classes: "It [the = symbol] serves here as a 'run' command. When treated in this way, the equality symbol looses [sic] the basic characteristics of an equivalence predicate: it stops being symmetrical or transitive. Indeed, young children seem to have no qualms about solving word problems with the help of a chain of non-transitive equalities. For instance, when asked 'How many marbles do you have after you win 4 marbles 3 times and 2 marbles 5 times?', the child would often write: 3*4=12+5*2=12+10=22."

    Equations of the form 2x-3 = 11 can be interpreted as a formula whose result is 11 (which can be solved by inverse operations); equations of the form 2x-3=5x-9 appear to be two different formulas, and inverse operations do not make sense.