


During the month of November or December, graduate students in my multivariate analysis class traditionally pay special homage to the celebrated Cayley-Hamilton theorem. It is accorded this high honor by Professor Maurice Tatsuoka in the chapter on linear transformations, axis rotation, and eigenvalues in his excellent textbook. The role of this theorem in the textbook is rather obscure. It is not, to my knowledge, applied or used in any multivariate technique or employed in the proof of any other theorem or formula in the entire text! It is presented as a stand-alone pillar of mathematical splendor. It is shocking to many students that a theorem can have absolutely no practical application what so ever but yet be intrinsically elegant in and of itself. However, one can reflect on innumerable things in real life that are of this very same nature.
In my graduate-level statistics classes I have three, four and five star handouts graded according to importance. The one-page handout on Cayley-Hamilton is a SIX-STAR handout and is the only one I have ever given this highest distinction. This handout is printed in limited quantities and serially numbered to insure its value as a collector's item. My students are admonished not to discard or in any way, bend, fold, or mutilate this work of art. Remember, I tell them..."ASK NOT WHAT THE CAYLEY-HAMILTON THEOREM CAN DO FOR YOU BUT WHAT YOU CAN DO FOR THE CAYLEY-HAMILTON THEOREM."
Here is a spectacular demonstration of how like its eigenvalues a matrix behaves. I am proud to present in gif format the celebrated Cayley-Hamilton theorem. ENJOY!!!

The Above Was Archived on 21 September 1999
This month I would like to focus on perhaps one of the most significant news event that has occurred in my lifetime. It makes the discovery of the incandescent light bulb or man's landing on the moon very inconsequential. I am of course referring to Mark McGwire of the St. Louis Cardinals and his shattering of the single season home run record on Tuesday, September 8, 1998 at 9:18 PM EDT. This record is the most revered one in all of sport both in the USA and many foreign countries. All eyes were fixed on Busch Stadium that evening when Mr. McGwire in the wink of an eye rifled his 62nd home run over the left field wall.
What does this feat have to do with the field of statistics you ask? I maintain it has everything to do with statistics. Baseball is a game whose very objective and rich heritage is vitally dependent on the art of record keeping and the meaningful manipulation of these records. There is no other sport in the world that breeds the thousands upon thousands of numbers and summaries that baseball does year after year. Indeed, each season there are many new records contrived to fit the particular accomplishments and combinations of skills of certain ballplayers and or the teams that employ them. Observe the emergence in recent years of the 30-30 or the 40-40 player or the manager's detailed charting of pitches thrown.
My purpose here is not to discusss all these newfangled indices. I will leave that task to the writers and news media who scramble to produce these tidbits to justify their existence. I simply want to capture the wonder of that magical September night and relate to you my observations of what were the important coincidences and facts about that historic day. Here they are:
You can see how a statistician can easily become obsessed with facts and figures like the above set particularly when that statistician happens to be a Cardinal fan. However, there is much more to this story. I would truly like to thank both Sammy Sosa of the Cubs and Mark McGwire for the great show that they have put on this year ( and the home run derby is not over with this writing). I think these two fine role models have shown baseball and this country that their close friendship and encouragement of one another is what life is all about! All other athletes should take special note of this relationship.

This month we will present another neat probability experiment that can easily be conducted by students as a short assignment or an in-class project. If you have been a regular reader of my home page you may recall the penny-spinning problem whose disscussion now resides in my Archives of Statistics Fun. That experiment and the current one represent excellent opportunities for an instructor to highlight the value of the Monte Carlo method in estimating probabilities for unusual variations in experiments that don't lend themselves to the usual formulas.
Let's state the current question in simple language:
If a penny is flipped until a head first appears, what is the probability that this first head occurs on an odd-numbered trial (i.e.,first, third, fifth, etc.)?At first blush, a typical student would reason that since the first head is just as likely to occur on an odd trial as it is on an even trial (second, fourth, sixth, etc.), the probability is obviously .5. But wait! Another student mentions that maybe the probaility should be somewhat greater than .5 since the first opportunity for a head to pop up is on the very first trial, and an odd trial continues to preceed an even trial after the first two. At this point the band-wagon effect sets in and students begin to incrementally up their estimates slightly from .5. But after many values are offered, a hush settles over the room and students begin to look at one another and shrug their shoulders. No one is really sure!
Enter Captain Sigma (the instructor)! With a flourish of his cape and a wink of his eye, he quietly suggests that this is a problem that just begs for empirical data. He urges each student to take about five minutes at home and repeat the experiment 10 times, tally how many times the first head appears on an odd trial, and bring the data to the next meeting. The students happily concent to this simple task ( Someone in the back of the room asks, "How many points is it worth?") and they all eagerly await the pooling of their data at the next meeting.
Two days later the instructor rushes into the classroom and puts all the results from 35 students on the board. The students sit on the edge of their seats in awe as the numbers accumulate. The final tally results in 245 out of 350 replications ending on an odd trial. Zowie! THAT IS 70%! Something is wrong. The pennies must have been seriously flawed.
The instuctor showing no emotion on his face allows the buzzing and chattering to go on for several minutes. Finally he cracks a grin and informs the students that this result is a very good estimate although it is a tad too high. He proudly states that the actual answer is P=2/3 or 67%. The students are dumbfounded and become quite excitable. They actually all cheer for the instructor and demand a formal proof (Did I say "cheer" in a stat class? I must be delirious from a high fever!).
Here is what the instructor wrote on the board:
The solution involves the sum of the first n terms of a geometric series expressed as:
S = a + ar + ar2 + ... + arn-1
Where
a = first term of the series
n = number of terms
r = the common ratio
S = the sum of the first n terms calculated by
S = a (1 - rn) / (1 - r)
In our case, a =1/2 = .5 and r = (1/2)(1/2) = .52 or .25 and using the first expression for S we have:
S = .5 + .53 + .55 + ...
In words, the above is stating that the probability of getting the first head on an odd trial is the probability of getting a head on the first trial (.5) plus the probability of geting a head on the third trial (.5)(.5)(.5) plus the probability of getting a head on the fifth trial (.5)(.5)(.5)(.5)(.5) plus etc.,etc. for n trials.
Now to compute what this sum would be for n trials we calculate using the second formula:
S = .5 (1 - .25n) / (1 - .25)
Finally taking the limit of this calculation as n approachs infinite, we arrive at
S = (.5) / (1 - .25) = (1/2) / (3/4) = 2/3 or .67
Truly Remarkable! The students all give the instructor a standing ovation and shout "QED" "QED" "QED".... The instructor smiles sheepishly while taking a bow and thinks to himself how rewarding it is to be a statistics professor.

The Above Was Archived on 13 September 1998
Here are the answers to last month's crossword puzzle. As warned previously, some of these statististicians are not exactly household words. Use the following scale to grade your perfomance as a Statistician Trivialist:
CROSSWORD
PUZZLE SOLUTION
Statisticians

| Across
4. Likes a good match | Down
1. Inventor of cotton gin |
|---|
The Above Was Archived on 10 July 1998
One of the most intriguing and frequently mentioned probability questions of all time is the so-called "Birthday Problem." I really do not remember when I first encountered this question but I know it has been around for decades and pops up in many treatments of probability in basic statistics textbooks. Let us revisit this interesting problem and hopefully shed some new light on this time-honored topic.
First, for those readers who are not familiar with this problem, let us pose the question in its simplest form:
Given a room with a random collection of N people, what is the minimum N needed for an observer to state there is greater than a 50-50 chance of at least two people having identical birthdays?Responses to this question are many and varied depending on a person's exposure to probability topics. However, the three most frequently offered answers are (a) 183 (b) 20 and (c) 23. The correct answer,of course, is (c) 23 which may shock some of you and prompt you to immediately head out and bet some of your buddies on a coincidence of birthdays in rooms with this few people present. Before you make this rash decision read the remainder of this discussion.Note:We shall assume in our discussion that a year has 365 days rather than the 366 in a leap year. We shall also assume that by "identical birthday" or "birthday coincidence" or "duplicate birthday" we mean the same month and day disregarding the year of birth.
The (a) response of 183 has much intuitive appeal for the ordinary person on the street. He or she would reason that in order to be absolutely certain that two birthdays coincide, 366 people would be needed in the room. Now since a probability just greater than .50 of a duplicate is all that is wanted, simply take 1/2 of 366 and arrive at 183. This seems logical but the laws of probability tell us the correct N is dramatically smaller than 183! Just how much smaller?
Many people who have studied a little probability would give the (b) response of 20.
Wow! This intuitively seems way to small to give us even a slight chance of a
coincidence of birthdays let alone a better than even chance. But the reasoning merits
close examination and goes something like this:
Check the birthdays in the room one by one. After the first person has given his or her
birthday, the second person will have one chance in 365 of having the same birthday as
the first. The third person could have the same birthday as the first or second person,
so the third person has 2 chances in 365. Added to the chance from the second person,
there are a total of 3 chances in 365. By the same logic, the fourth person has 3
chances of having the same birthday as any of the first three so this needs to be
added to the previous 3 to get 6 chances out of 365, and so on. By the time we exceed
183 chances in 365, which is just greater than our 50-50 probability, we will have
checked just 20 people. Mathematically, this is more concisely expressed as follows:
We want the smallest integer N-1 such that
(0)(1/365)+(1)(1/365)+(2)(1/365)+(3)(1/365)+...+(N-1)(1/365) > 1/2 or
(1 + 2 + 3 +...+(N-1))/365 > 1/2
The required N-1 is 19 since 1 + 2 + 3 +...+ 19 = 190 but don't forget to add one for the first person checked even though a match can not occur with just one person. Thus N = 20 people are required according to this line of reasoning.
But hold the phone! We have a serious flaw in this argument. The number (1 + 2 + 3 + ...+(N-1))/365 is NOT a probability but the EXPECTED VALUE or MEAN NUMBER of birthday coincidences for N = 20 people in a room. Thus, if many rooms of randomly assembled N's of 20 people were examined, the mean number of coincidences is just greater than 1/2. This is not particularly reassuring to a shrewd betting person!
Although N=20 is an incorrect answer to the original problem it does suggest an alternate approach for betting purposes. Suppose we wanted the expected value of coincidences to be just greater than one. We could continue the above computatation for several more terms until the ratio just exceeded one. With a calculator it is easy to see that we must only go out to N-1=27 or N=28 for this to occur. Thus with many rooms of N=28 people we would have a mean number of coincidences just greater than one and many bettors would take greater comfort in this value.
Now let us explain the correct answer (c) N=23 for the original problem. The easiest approach is to find the probability of NO duplicate birthdays in a sample of size N and then subtract this result from ONE to get the probability of at LEAST ONE duplicate. Again we shall check the people one by one. After the first person establishes a birthday (P=365/365), the probability of the second person's birthday not duplicating the first is (365/365)(364/365). The probability of the third person not duplicating the first two is (365/365)(364/365)(363/365). This multiplicative process goes on and on and for a given N this product becomes (365/365)(364/365)(363/365)...((365-N)/365). Since this term is the probability of no duplicates, our task is to determine the value of N that will cause this probability to be as large as possible without exceeding .50. Then when this probability is subtacted from one the probability of at least one duplicate will just exceed .50. With a hand calculator it is easy to show that when N=22 this product is .5243 and 1 - .5243 = .4757 but when N=23 the product is .4927 and 1 -.4927 =.5073. We can thus state that if a room contains 23 randomly assembled people, we stand a slightly better than 50-50 chance of finding a duplicate birthday.
If you are a conservative bettor and flinch at the above chances, I have developed the following table that allows you to read in the desired probability level for a duplicate birthday and read out the required N in your room. Note that the probability levels should be interpreted as actually those just greater than the listed value.
| Required Numbers of People For Selected Probabilities of a Birthday Coincidence |
|
|---|---|
| Probability | N |
| .50 | 23 |
| .60 | 27 |
| .70 | 30 |
| .75 | 32 |
| .80 | 35 |
| .90 | 41 |
| .95 | 47 |
| .99 | 57 |
Thus if you are a real gambler, when the situation presents itself, you go with N=23 and impress the pants off everyone in the room by hopefully finding a duplicate. If you don't feel that you are an exceptionally lucky person, then you might select the comfortable 75-25 chance of a duplicate and use N=32. On the other hand, if you fall at the other end of the continuum and only bet on close to sure things, then pick the .99 level and go with N=57. Here you are almost certain to find a duplicate but the people will not be that impressed and you won't elicit that wonderful "WOW!" effect.
I tried this experiment last semester in my Statistics I class with N=26 students in attendance that day. I knew my chances were below .60 but I put on an air of absolute certainty with my pronouncement. I confidently started around the room with students stating their birthdays. When I got to the 12th person I had a duplicate. The reaction was electrifying. You would have thought that I had just floated an elephant in mid-air! My student ratings skyrocketed for at least one day!
I hope you have enjoyed my presentation of the above topic and hopefully if you are an instructor you can have some fun and try this in your class. I must fess up to one other assumption that was not mentioned earlier. Not only must you assume a random sample of people are assembled in the room but theoretically you must assume that birthdays are randomly distributed throughout the 365 days of the year. This is probably not satisfied in any strict sense but that is a question involving a whole different ballgame. If you are turned on by the concept of chance and how pervasive it is in our society check out Chance News, a bimonthly newsletter letter published at Dartmouth University.
The Above Was Archived on 5 April 1998
This is the season of good cheer and merriment. If you know of a lonely statistician
please tell him that you love him and truly appreciate all the wonderful methodologies
that he has perpetuated and enhanced throughout his career. I am sure any kind remarks
directed his way will warm his heart and point him toward the new year with a renewed
sense of vitality and dedication.
HAPPY HOLIDAYS to all my readers! May all your Summation
Sigmas be operational and all your means be true m's.
We at RAMO PRODUCTIONS are particularly thankful for
this Holiday Season. In fact, we are ecstatic and even giddy over the honor that was
recently bestowed on this Web Site. At the 1997 Annual Conference of the Society for the
Preservation of Humor In Statistics (SPOHIS) held November 20-22 in Las Vegas, this
Home Page was awarded "The Golden Sigma Cup." This highly coveted award signifies the BEST
contribution of any Site on the WWW toward the promotion of statistics as a humorous
subject. The acceptance of this award was truly a defining emotional moment in my career.
I would like to thank all the members of the SPOHIS Academy for the necessary and
sufficient consideration given all the nominees for this award and the unbiased
selection of this particular site. I will try to be a worthy recipient of this magnificent
cup and redirect my energies toward uncovering new tidbits of humor that make
statistics the enchanting field that it has now become.
The Above Was Archived on 7 February 1998
This month all my readers will be given a real treat. The World Famous Three Step
Method (WFTSM) will be revealed. I have had many requests and pleadings through my
guestbook and other personal email to present this marvelous technique to the World
Wide Web. This procedure which I consider the Holy Grail of statistical methodology
(just ask my students) is a three-step sequence for calculating the sample standard
deviation. Some textbooks emphasize a multi-step, direct or brute force procedure (see formula on right)
which for most situations is very painful and tedious. That is, given a set of
N raw scores (X-scores) you are advised to (a) compute the mean
(b) subtract the mean from each score (c) square each
of these deviations (d) sum the squared deviations (e) divide the sum of squared deviations by
the number of scores N and
finally (f) extract the square root of this result. While this technique works rather
well when the number of scores is small and the mean is a nice whole number, it is a
nightmare in other situations. When the number of scores is say 15 or more and the
mean is a decimal (In practice this will be true about 95% of the time), this procedure
involves repeated subtracting and squaring of decimals and gets extremely messy even
when using a calculator. A better method is needed!
To illustrate this new calculation consider a simple example. Suppose we are given
the following set of 15 raw scores (X's):
Now substituting the above results and applying WFTSM:
Thanks for reading the development of my favorite statistical procedure. Who says
statistics has to be dull when it embraces world renowned technology cradled in
elegant blue boxes!
The Above Was Archived on 20 December 1997
October is the month of goblins and ghoulies. Unfortunately, students of basic
statistics experience far too many of these creatures on days other than Halloween
night. As promised last month, I will offer some general suggestions for teaching
the course in basic statistics. Several caveats are in order. First, these ideas have
worked for me over several decades of teaching but I make no warranties they will work
for other instructors. Secondly, these techniques have been employed in classes with
enrollments of between 30 and 40 students and therefore are probably not appropriate
for large lecture sections. With this in mind, I present this short list of hints to
help rid the statistical learning environment of goblins and ghoulies:
The Above Was Archived on 11 November 1997
The fall semester has now begun at most universities across this great country. This means
that many students are experiencing for the first time an encounter with a basic applied
statistics course. Whether the course is taken in business, psychology, education, biology,
economics or some other discipline really does not matter. The frequency of application of
certain techniques will vary from field to field but the basic concepts remain amazingly the
same. If you are an upperclassman, this is the course you have postponed for several years
and with great trepidation must now meet head on. If you are an underclassman, the fear is
no less since the horror stories already hit the moment you arrived on campus. You must cope
with this perceived encirclement by dragons. Your mental outlook and approach to this course
will become the single most important determinant of a meaningful positive experience with
beginning statistics. I have attempted to put together, from several decades of teaching,
a short list of helpful suggestions for the student. I make no warranties that these will
work with everyone. I do know, however, that many students over the years have given me
feedback that these hints are quite useful. Here they are:
If you are a student, I hope the above suggestions prove useful. Next month I will
present some tips for the instructor of basic statistics.
The Above Was Archived on 7 October 1997
Hope you enjoyed the above. Have a happy holiday season! If you are a
statistician don't take yourself seriously and laugh at yourself. If you are a
student make a New Year's resolution to attemmpt to understand the poor
statisticians of this world who are only trying to eke out a living.
The Above Was Archived on 28 February 1997
The Above Was Archived on 20 December 1996
Here are the basic concepts of a random sample and bias. A sample is considered
random if each member of the population from which it is drawn has an equal chance of
of being selected. I like to think of a random sample as an equal opportunity
employer. A table of random numbers or a computer is usually employed to draw a random
sample. A sample becomes biased when certain members of the population have a greater
chance of being selected than others. The sample then tends to systematically
overestimate or underestimate a certain characteristic of the population such as the
percentage of a particular class of individuals. Thus, serious inferential errors may
occur.
Now go back to the year 1936. This was the year that pitted the Republican, Alf
Landon, against the Democratic incumbent, Franklin Roosevelt. This year was during
the great depression. It also should be remembered for the record extreme
temperatures and dust storms that hit the midwest. In fact, it was so hot that a
statistician could not even calculate a standard deviation without working up a sweat.
Moreover, the high humidity that year forced the Goudey Baseball Card Company to print
only black and white cards and many of the cards came off the printing press
hopelessly bowed.
A prestigious weekly news periodical called The Literary Digest continued
that year a tradition of conducting a national presidential poll through the mail.
Supreme faith was placed in a humongous sample of 10,000,000 prospective voters
drawn primarily from telephone directories. When the returns were tallied, an easy win
for Landon was indicated. This highly respected periodical staked its reputation on
this outcome. With such a huge sample how could anything go wrong?
Well, as strange as it may seem, the poll was a miserable failure. Two sources of
bias that were unfortunately in the same direction raised havoc with the results.
First, the poll excluded non-telephone owners and hence also included a
disproportionate number in the older age groups. Since many more Republicans (the
wealthy) in these depression years owned phones than Democrats (the poor), it is
not surprising that the returned ballots would favor the GOP candidate. Secondly,
it is a fairly well known fact that members of the party out of power are far more
likely to return ballots through the mail than members of the "in" party. This again
supported more Republican ballots. These two sources of bias formed a potent
combination that pointed toward a Landon victory. It is interesting to note that in
the actual election, Roosevelt swept all states except two and won in a landslide.
It is also of historical note that several years after this debacleThe Literary
Digest went out of business.
As an ironic footnote to the above story, The Literary Digest, using the
same sampling technique, correctly called the outcome of the 1932 election. Recall
that Roosevelt ran against the incumbent Republican, Hoover. Again, economic problems
were the prime issues in this campaign. However, the above two sources of bias were
in opposite directions and tended to cancel one another out. That is, the use of
telephone directories favored Republican returns but the "out of office" phenomenon
favored Democratic returns. Thus, through sheer luck,The Literary Digest
correctly predicted a win for FDR.
Here are some important lessons from these historic presidential polls:
The Above Was Archived on 10 November 1996 If you have taken a basic statistics course, when the topic of probability was
introduced you no doubt heard the instructor mention the time-honored coin
flipping example. Flip a penny and the chances of getting a tail (or head) is 1/2.
No problem-this is a concept that a primary-aged child can understand.
But change the scenerio slightly. Suppose a penny held vertically on a table by
the index finger of one hand is spun vigorously with a flick of the other index
finger and allowed to come to rest flat on the table. Is the probability still
1/2 of either a head or a tail facing up? Well, the head-nodders in the first
few rows generally smile warmly and shake their heads up and down in agreement
( By the way, all you aspiring statistics instructors should enlist at least five
head-nodders prior to the second week of class to offer you constant supportive
feedback during the entire semester :-)). However, the more the students reflect
on this situation, the more uncertain they become. Is spinning really the same as
flipping? Finally, a carefully planted confederate toward the back of the room
timidly suggests that maybe we should replicate the experiment a number of times
and see what happens. Yes! Yes! Yes! Just what you want as an instructor. You
quickly seize this opportunity to introduce the class to Monte Carlo type
probability. You announce an extra credit assignment for everyone in the class.
Each student is instructed to select a relatively shiny penny without noticeable
wear, spin the penny on a table 100 times, and record the number of tails that
face up. A deadly silence settles over the classroom! The students now
realize they have been hoodwinked into performing a rather embarrassing act,
particularly if their dorm roommates are watching that evening. Spin a penny 100
times on a table and watch it land- what type of looneyness is this? Before any
student utters another word, you quickly remind them that all the results will be
posted and discussed next meeting, and then you grudgingly dismiss them two
minutes early.
Next meeting the fruits of your well-planned operation are realized. Thirty of your 35
students complete the experiment. Wow!- you gleefully think to yourself. That is
3000 replications of the penny-spinning experiment. Methodically, you start around
the room asking each student to report the number of tails he or she obtained on
the 100 trials. The numbers roll in and you write them on the chalkboard: 65, 59,
57, 64, 52, 70,... The students sit in utter amazement as a definite pattern
unfolds. Overall, the numbers appear to be much larger than 50! When all the
numbers are collected, a mystified student in the rear of the room suggests that we
average the 30 results. Obligingly, I ask a student with a fancy calculator and a
pocket protector in the front row to add up the results and find the mean. In a
wink of the eye, the student blurts out 62.12. This is totally unreal! Can we place
any faith at all in this finding? Does this mean that if we spin a penny on a table
many times the coin will fall with tails facing up about 62% of the time?
*DISCUSSION*
The Above Was Archived on 4 October 1996 The field of statistics is replete with technical terms or jargon that I
prefer to call "club words" in my classes. We have a lot of fun with these since
I tell my students that they can derive much satisfaction from mastering these
and joining a very unique club. They are then able to throw these terms around
in casual conversation and blow the sox off of their friends who are not in
the "club". Let me give you a few examples of some real humdingers.
The Above Was Archived on 4 September 1996.
Now I shall move to a more generalized definition of degrees of freedom. The degrees of freedom of a statistic is the number of observations minus the
number of necessary auxillary values which values themselves are based on the obsevations. This is kind of a nasty statement and somewhat flakey but
don't panic. The rule works for 95% of the situations and that isn't bad statistically is it? In the last example the variables are the observations and
the auxilliary value is the mean (note it is based on the four observed values), and therefore the df= 4 - 1 = 3. Finally, one more example using this
better rule and I shall close up shop for the month. When correlated pairs are present, what is the df for that situation. If I give you the correlated pairs
(X1, Y1), (X2, Y2), (X3, Y3), (X4, Y4) and again allow you to assign
the values, the df is not 8 but 4 because a correlated pair is an observation (Now don't be rigid and not allow this). Also if you set the X mean and
the Y mean, these count as two auxilliary values and the df = 4 - 2 = 2. In other words , in a correlational situational, the df is N-2 where N is the
number of paired scores. With this definition you must expand your notion of an observation and be cautious about auxilliary values. Next month I will use
this new rule to explain the df for the standard errors of some test statistics.
Several interesting observations are in order. Last week we determined that the sample standard deviation s had df=N-1 which is the same df as the
estimated standard error of the mean in this test. Also if you suddenly go brain dead and forget the df-value for this test, it is staring you right
in the face in the denominator of the formula. This is not by chance but occurs quite frequently with t-tests. Pretty nifty huh?
Well that concludes my ramblings for January. I hope you are realizing that statistics has many reoccuring themes. Certainly the principles for
counting degrees of freedom is one of them. You all now should be experts in counting degrees of freedom at least when you perform William Sealy
Gossett's celebrated t-test.
Well I understand where you are coming from in the moderate to large sample situation, but these
two quantities have caused students of statistics more problems and confusion than a barrel of monkeys
particularly when the students have used several textbooks in a course or in different statistics courses.
The crux of the issue which generally an author makes no mention of is that the sample variance and
standard deviation can be defined two different ways. This in turn makes subsequent formulas such as estimated
standard errors (or error variances) look "seemingly" different depending on the definition when in reality the formulas
are equivalent. The reason I feel that this issue requires discussion is that, to my knowledge, I have not seen a good
explanation of this problem in any statistics textbook and it will save you questioning whether there are typos on many
pages of the book. Let us then examine each definition , see where it takes us, and talk about the
positives and negatives of each choice. Here you are going to see an issue that many statisticians are split on. If I were
to guess I would say that the statistical community is about 50/50 on this one!
Now the two methods part company. In (A) we divide the
∑x2 by the sample size N and this produces the sample variance s2. Since
this index is in squared units, if we want an index in the original score units we extract the square root and have
the sample standard deviation s. Division by N in this process makes sense logically because then we are able to state that
the sample variance is the average squared deviation of each of the scores in the sample about the mean. This just
shouts that it is measuring variation and it also just feels like a meaningful way of getting at the spread of a set
of scores. Also it is valid when you have an N of 1 since the variance and standard deviation would be 0 which upon
reflection is exactly what it should be.
Turning to method (B), we divide ∑x2 by N-1 to get s2 and then
take the square root if desired to obtain s. However, the N-1 just seems nonintuitive. You cannot now neatly enterpret the
sample variance as an average and the two formulas seem to lose their logical appeal. In addition, if N is 1 then the
variance and standard deviation are both undefined because you are dividing by 0. Why then would anyone employ (B) to define
the sample variance and standard deviation? I am going to whisper this but there is one slight advantage of (B). The reality
of the matter is that with (B) you really have calculated an unbiased estimate of the population variance and very
close to the same for the population standard deviation. So some authors feel that this method bypasses the sample
index and moves directly to the population estimate. Thus, when authors label
s2 = ∑x2/(N-1) and the subsequent square root as the sample variance and
sample standard deviation, they are really somewhat disingenuous in doing so.
I will illustrate the confusion that the two definitions can create when you are reading different books. If the author uses
(A), the estimated error variance of the mean is given by s2/(N-1) whereas if the author prefers (B) the same
estimated error variance of the mean is s2/N...Two seemingly different results! But wait, two different definitions have been
used for s2. REALLY THE TWO RESULTS ARE IDENTICAL! To show this, using the former result and substituting (A) for
s2, we have ∑x2/N(N-1). Now using the latter result and substituting (B) for
s2, we have ∑x2/(N-1)N...precisely identical results. Sooo...(what
Steve Jobs would utter) what does all this mean? THE FIRST THING A PERSON SHOULD CHECK UPON OPENING A STATISTICS TEXTBOOK
IS SEE WHAT STANCE THE AUTHOR TAKES ON THE N VS N-1 ISSUE IN DEFINING THE SAMPLE VARIANCE AND STANDARD DEVIATION. My opinion
favors division by N but about half of the textbooks use division by N-1 so be prepared to make adjustments in your thinking.
Statisticians end up at the same place on this one but sure create some illusions along the way. Thanks for reading my blurb and
see you next month.
Next moving to the another 3-Step Procedure displayed to the right (notice never 2, never 4, always 3 steps for nice psychological
closure), draw a random sample of size N from a population with known σ. Then convert the sample mean to a z in the previous
probability statement and get statement (1) for a result. Then solving this three-way inequality with some simple algebra and getting
μ smack dab in the middle by itself and everything else on the ends we arrive exactly where we want to be with statement (2).
These end expressions are indeed the formulas for the lower and upper limits of a 95% confidence interval for μ. They are pulled out
and stated for emphasis in statements (3). To cement these formulas in our minds let's do a simple example. Suppose we have a population
of IQ scores with an unknown μ and σ = 16, We want to generate a 95% confidence interval for the population mean μ. If a random
sample of N=64 is drawn and
LL = 98.7 -1.96(16/sq rt(64)) = 98.7 -1.96(2) = 98.7 - 3.92 = 94.78
Now the fun begins folks when we try to interpret these results. But you say, "This is a snap. We simply say the probablity that the
population mean μ is between 94.78 and 102.62 is .95." But wait I hate to inform you that the population μ is a a fixed parameter
and it is either between 94.78 and 102.62 ahead of time in which case the probability is one or the population
μ is not beween 94.78 and 102.62 ahead of time in which case the probability is zero. Keep in mind probabilities refer to random variables
and the mean μ is a fixed constant even though we don't know what it is. In other words, we can not associate a probability with any single
pair of limits. This seems like a minor problem, but to many experts it is a real deterrant for using confidence intervals. Now we could replicate
the experiment and obtain several sets of limits. Would this add any information? Certainly it would, but each pair of limits would be subject to
the same criticism. But if I did collect an infinity of limits from N's of 64, 95% of the limits would contain the true value of μ. This is a true
statement but many would deem this fact essentially useless.
The big advantage of a hypothesis test where an H0 is tested against a two-tailed alternative is that you do end up with an observed
test statistic that has a probability associated with it when you reject or retain the null hypothesis. This method appears to appeal to many
researchers even though we all know one hypothesis test does not prove anything. It is my speculation that the language itself with hypothesis testing
has a certain degree of strength and finality associated with it. Expressions such as "Reject H0: μ1 - μ2 = 0 at the
.05 level of significance and Accept the alternative that H1: μ1 > μ2" have a ring of authority linked to them.
Recall also, thanks to Pearson and Neyman, we have our dear old friends Type I Error, Type II Error, and the Power of the test. It is indeed sad that the
confidence interval approach has no such counterparts. In addition, the terminology of reject or retain H0 seems to mesh with complex ANOVA's
where multiple comparisons are perfomed following a significant overall test. For these reasons and perhaps others that I have overlooked, hypothesis testing
currently is the KING of the HILL with statisticians.
I would like to give you one advantage for the lonely confidence interval before I close shop. Assume the limits of the previous example where Δ = .95.
If another reader reads these results and desires to hypothesis test instead, the results can be predicted very easily. Remember confidence intervals by nature
are two-tailed and must be compared with a two-tailed hypothesis test. If the reader wants to test H0:μ = 100 with .05 as level of significance
against two alternatives, retention of H0,: μ = 100 would be predicted because 100 is contained between the limits of 94.78 and 102.62. If the
reader desires to test H0: μ = 104 against two alternatives, rejection of H0: μ = 104 would be predicted and acceptance of
H1: μ < 104 would be supported since 104 is above both limits of 94.78 and 102.62. This may be continued on and on. Thus, the reader may very
quickly and easily test any null hypotheses that his heart desires with the single set of data and limits given. Mathematicians have always thought this was
pretty neat. However it has not caught on in other disciplines and this interpretation has not helped the cause for confidence intervals.
Thus, we conclude our cases for both methodologies of inference. I must admit I also favor hypothesis testing but who knows where we will be in ten years.
Maybe we will turn to Tukey's Exploratory Data Analysis and refine sampling procedures to such a point where we do not even have to use inferential statistics.
Now that would be a monumental advance. Meanwhile, thanks again for reading this presentation and HAPPY INFERRING!
For more statistics humor please visit the First Internet
Gallery of Statistics Jokes.
Please email comments about this page to
gcramsey@ilstu.edu
Member of the Science Humor Net Ring
Never fear. A white knight is waiting in the wings. Let us apply some finesse and
demonstrate an elegant substitution for all but the last two steps in the above
procedure. Please study the animated gif on the right. Observe that Step One is
the key to the entire computation. It is the mathematical equivalent of steps
(a) through (d) in the "brute force" method.
It requires only two basic
calculations: ∑X (the sum of the raw scores) and
∑X2 (the sum of the squares of the raw scores).
Once Step One is computed, school is almost out and Steps Two and Three roll out very
easily. Note also that Steps Two and Three here are exactly the same as the earlier
steps (e) and (f) respectively.
5, 6, 8, 8, 10, 12, 12, 12, 14, 16, 16, 18, 19, 20
For our data ∑X = 5 + 6 + 8 +...+ 20 = 194 and
∑X2 = 52 + 62 + 82
+...+ 202 = 2838
VOILA! There you have it ladies and gentlemen. This is the formula
that has taken the world by storm all the way from El Paso, Illinois to Paris, France!
You ask - Why is it so gosh darn great and globally famous?
∑x2 = 2838 - (194)2/15
∑x2 = 2838 - 2509.0667 = 328.9323
s2 = 328.9323 / 15 = 21.9288
s = Sq Root (21.9288) = 4.68
Let me enumerate its many advantages and virtues:
OK now that we have proven WFTSM is the greatest thing since
sliced bread, how do we accord it high distinction in the folklore of statistical
methodology? This is my suggestion to all students: Get a calligrapher to write WFTSM
on fine parchment paper. Then roll and tie the scroll and place it in a soft bed of
puffed pima cotton. Place the cotton and scroll in a box wrapped in blue holographic
foil. Finally have an expert gift-wrapper surround the box with white silk ribbon
topped with an elegant bow. Finally go to your dining room table and replace the
flower arrangement as the center piece with the blue box. Now we have given WFTSM its
proper respect. It will be the center of attraction for all guests invited to your
house for dinner. Just think of the thrills you will experience explaining to your
best friends the story of the blue box and WFTSM.Teaching Tips for the Instructor of Basic Statistics
Thanks everyone for reading my tips for the instructor. Hopefully, some of the above
will promote some lively discussion. Let me hear from you!Study Tips for the Student of Basic Statistics
Merry Christmas everyone! Contrary to popular belief statisticians also believe
in Santa Claus and have their wish lists. I thought you might want to see what
desires I have had for many years. These are far out so be prepared!
CHRISTMAS WISH LIST
For the month of November we have a very special report for you! From our home
office high atop the grain elevator in Fooseland, Illinois we are proud to bring
you: THE TOP TEN REASONS WHY STATISTICIANS ARE MISUNDERSTOOD. These are not listed in
any particular order of importance but represent all those nagging suspicions you have
always harbored against statisticians but were always afraid to ask about. Fasten your
seat belts and here we go!
We are quickly approaching election day and throughout the entire month of
October you can expect to be bombarded with the results of many presidential polls.
The pollsters of today (Gallup, Roper, etc.) use highly sophisticated techniques that
employ samples of about 1600 or less registered voters who are likely to vote. If
these samples are drawn at random, the public can expect the percentages that favor
the candidates to fall within a 3% or 4% margin of error. This all sounds great to
the typical citizen (except if your candidate is trailing). What happens, however, if
the sample is biased or in some pernicious way, nonrandom? In short, incorrect
inferences may be drawn and widely disseminated, the public may lose faith, and entire
polling organizations or their sponsers may go out of business! Following is what I
consider to be the worst case in history of a biased presidential poll which resulted
in such a devastating effect (No folks, I am not going to rehash the Truman-Dewey
election of 1948).
This is no abberation. Experts refer to this phenomenon as the "pop bottle cap
effect". Find a cap from an old 16 oz. bottle of Pepsi or Coke and spin it in the
same fashion we did the penny. About 90% of the time or more, the cap will fall
with its top facing down and the sides facing up. Now how does this relate to the
penny? If you examine closely a relatively new shiny penny, you will observe that
the edge around the penny protrudes further on the tail's side than on the head's
side. Thus, the extra edge on the tail's side simulates the side of the pop bottle
cap although certainly not as pronounced visibly. The experts proclaim that the
extra edge produces results that in the long run converge on 60% tails facing up.
Of course, if you use a worn penny, this advantage in favor of tails disappears.
I can indeed attest to these results. In the four or five years that I have used
this experiment in class, the results have hovered right around 60%. Amazing but
true. I then tell my students they have a sure way of winning some money. Engage
a friend ( maybe your nosey roommate from last night) in a four or five hour
penny-spinning game and bet on tails each time!
Gosh Henry! It really does work!
Equal population variances.
The linear approximation of an unlisted value in a statistical table by using
two listed values.
The degree of peakedness in a graph of a distribution of scores.
Retaining a false null hypothesis in inferential statistics.
The distribution of the standard normal curve.
Geez Albert! It wasn't that bad was it?





















OUR MONTHLY SERIOUS BUSINESS
DEC 2007........DEGREES OF FREEDOM
Degrees of Freedom is a a very slippery concept that always seems on the verge of being mastered only to slither through the fingers
of the beginning student. I will attempt to give it a very simple conceptualization and then present a working rule of thumb
for counting the degrees of freedom in a variety of situations. In its simplest form the degrees of freedom (df-value) of a situation
is the number of variables that you are allowed to vary freely without restriction. Thus, if I tell you X1 is a variable and you
are perfectly free to assign it any number in our number system, you would have 1 df (not an infinity as you might think). Remember
it is the number of variables not the number of values you can give a variable. Ok now suppose I expand the situation slightly.
If I now give you the variables X1, X2, X3, and X4 and I again tell you can assign each of
the four variables any number in our number system, you now have increased your df-value to 4. Get the idea? Now one more example will
set this basic definition in stone (you can pick marble if you so desire). If I again give you the same four variables of the
previous example and again offer you the opportunity to assign each variable any value until your heart is content but there is only one
catch, the mean of the four numbers must be 8 when you end up. So you go on your merry way and give X1 the value 10, X2 the value
5, X3 the value 8, and without hesitation you go with the value 12 for X4. But whoa, the mean of your four numbers is
8.75 not 8! Now you suddenly realize you can't just give that 4th variable any value. You must give it a value that makes the sum
of the four values 32 so that when you divide by 4 you get 8. Oh my gosh you are locked into the value 9 for X4! YOU HAVE LOST
A DEGREE OF FREEDOM and really only have df=3 in this situation. In other words, giving you the opportunity to assign four variables any value but
forcing the mean to be 8 is tantamount to losing a df. Knowledge of the mean counts as a restriction and subtracts one from the total df.HAPPY HOLIDAYS
JAN 2008........COUNTING DEGREES OF FREEDOM FOR A TEST STATISTIC
Now that we all know something about the concept of degrees of freedom (df), I will show you how it plays a critical role
in many statistical hypothesis tests. As you will soon see the df-value is usually a function of the sample size N. Many test statistics like t,
F, and Chi Square have distinct df-values associated with them that must be determined in a given situation(In the case of F, it even has TWO
distinct df-values...can you imagine that!). The df-value(s) is then entered into a table in the appendix of your book along with the level of
significance to determine what critical value is needed to declare statistical significance. I shall use the dear old Student t given to the
right to illutrate how you count this df-value for a few tests. The key with any t-test is to count the df-value of the ESTIMATED standard error
which is the denominator of the ratio given to the right (indicated by the tilda sign). This then becomes the df-value for the test itself.
Recall that a standard error is nothing more than a highly specialized standard deviation of the sampling distribution of the test statistic.
Think of this formula as a generic template for ANY t-test with T standing for the test statistic in any given situation. In words, this formula
is saying to calculate a t-ratio, take the observed value of the test statistic T and subtract the hypothesized population mean of the test
statistic T and then divide this result by the ESTIMATED standard error of the test statistic T. Notice the emphasis on "ESTIMATED". If this
were the EXACT standard error (no tilda), the statistic would become the well known standard normal z-test. Remember again for a test to be
considered a t-test, it must be capable of being placed in the general format of this formula and the denominator must be an ESTIMATED standard
error. As you might guess, the t-ratio at first glance could easily be mistaken for a z-ratio except for that little wiggle above the
standard error indicating ESTIMATION. In fact, the distribution curves of both t and z are very similar(bell-shaped) except the t curve has more
area in the tails as a function of the df-value. The greater the df-value the more alike the two curves become. Don't tell anyone this but a
standard normal z-curve is really a special case of a t-curve with df=infinity. Now isn't that the cat's meow!
I shall now turn to the most basic t-test of them all displayed to the right, the t for testing a hypothesis about a single population mean.
Recall that basically degrees of freedom is the number of variables that are allowed to vary freely without restriction. In hypothesis testing
we usually work with a random sample(s) of scores of some sort. Here think of each score in the sample as a variable that is capable of taking
on any value. Thus, each score becomes an observation and the total number of observations is the sample size N. Now the only necessary auxillary
value in this case is the sample mean. Hence invoking the principle from last month that the df-value of a statistic is the number of observations
minus the number of necessary auxillary values, the df-value of the estimated standard error in the denominator of this ratio is N-1 which becomes
the df-value for this basic test. This ratio for example might be used to test the null hypothesis that a popuation mean of IQ scores is 100 which
would be substituted on the right in the numerator. Of course, the sample mean and standard deviation s would be calculated from the data and
plugged in also.
OK since things are going so smoothly, I next want to discuss the most widely used t-test in the literature...the so-called independent samples t.
It is used to test the hypothesis that there is no difference in the means of two distinct popuations (i.e., the null hypothesis 0 is plugged into
the right side of the numerator). The formula to the right admitedly looks a little scary but again it is nothing more than an iteration of the
basic t template with the difference in the sample means serving as the test statistic. Here the observations are the scores in both samples and
the auxillary values are the two separate sample means. Thus, by our rule the df-value for the estimated standard error and the test is
N1+N2-2. Now that is pretty slick! The ingredients you need to calculate this t are the two sample means and the two sample
variances and of course the two sample sizes. Again popping out like a zeon light from the formula is the df-value from the denominator to jog
your memory. This test finds many applications. When you have two separate random samples of scores as in Experimental and Control groups or two
different treatment groups and you desire to test the significance of the difference in the two means, this t becomes the star of your stat world.
Finally, another relatively important t-test is presented that tests the hypothesis that there is no difference in the means of two correlated
populations. To conduct this test, we must use the framework of the correlated pairs of (X1-X2) scores which were discussed
last month. Fortunately in this situation you are allowed to compute a difference score (D) for each related pair in the sample and subsequently work
with the sample D's from then on out. In essence you have reverted back to the simple t test with D's taking the place of X's. Thank God for little
favors. Without this simple move, you must use an alternate method which requires that you compute the correlation coefficient and treat the
X1's and X2's separately. Believe me unless you do this on a computer it is a statistican's nightmare and requires three times the
work. Returning to the main problem of getting the df-value here, an observation becomes a D-value of which we have N and we have one auxilliary
value which is the mean D. Thus the df for the estimated standard error is N-1 which becomes the df-value for the test. Beware of something with
the calculation of t. You are working with a sample of D's so a difference is computed in the same order and you will probably end up with positive
and negative D's which must be accounted for. The sample mean D and the sample standard deviation of D along with 0 for the hypothesized value of the
population mean D are substituted in the formula and the value of t rolls out. This test is employed when you have a pre-test and post-test situation
for a number of subjects or when you have subjects that are matched on another variable prior to administering two treatments. A common mistake with
this test is to treat the X1's and the X2's as independent samples and use N+N-2 or 2N-2 as the df-value (too large) and employ
the independent samples t-test above. This would be a positively biased test and result in too many Type I errors.
FEB 2008........ N VS. N-1
What you say! You have to be kidding. You are making an issue of the number of scores in the sample
and ONE LESS THAN THE NUMBER OF SCORES? How can that make a pennies worth of difference except in
situations where the sample size is extremely small? How in the world can this be classified as a
Sticky Wicket?
Now look at the two methods that are labeled (A) and (B) to the right.
One thing both methods have in common is the sum of the squared deviations of the scores about the mean
(∑x2). Three Cheers! In other words, statisticians pretty much agree
that in most situations in order to measure how variable a set of scores is, you first must take into
account each and every score in the sample. That is , you find out how far each score is above or below the mean
of the sample (a deviation score). Then you square each of these deviation scores and summate the squared deviations.
This is the direct or "brute force" method of computing this quantity and it involves far too many messy decimals. It is
far easier to make this computation with only the raw scores and not fuss with the mean. You get ∑X and ∑X2
and employ STEP ONE of the World Famous Three Step Method. (i.e., ∑x2 = ∑X2 -
(∑X)2/N). See Step One WFTSM for an example of this calculation.
MAR & APR 2008........THE DEMISE OF THE CONFIDENCE INTERVAL
In inferential statistics, there have been two primary methodologies for gaining knowledge about
population parameters. However, hypothesis testing has become the dominant force over confidence
intervals throughout the latter half of the 20th century and into the 21st century. In fact in most
disciplines, testing null hyotheses has become the exclusive method of choice in almost all of the
research literature. The current textbooks have very little to say about confidence intervals. If they
do it is in the form of a token short discussion or footnote. What has happened to a procedure that once
was favored by mathematical statisticians and had an entire chapter devoted to it? Let us take a look at
this procedure and see what difficulties have caused it to fall out of favor.
We will present a simple example of calculating upper and lower limits of a 95%
confidence interval for a population mean μ. The figure at the right displays a standard normal curve of z-scores
with two examples of useful percentiles that would be needed to obtain a 95% confidence interval. The first is called
z.025 = -1.96 and by definition is the point on the z-scale such that 2.5% (.025) of the area falls below it
(Remember the total area under this curve is 1 so areas correspond to probabilities). Now at the upper end we have
z.975 = +1.96 or the point on the z-scale such that 97.5% (.975) of the area falls below it (upper blue
area is therefore .025). The -1.96 and +1.96 come from the standard normal curve table and were perhaps memorized by
some of you. Also the middle white area (called Δ or the confidence coefficient) then becomes 95% or .95.
Note that in building a confidence interval, Δ is selected first and the tail-areas are always equal. Some other
commonly used percentiles that may be dear to your heart from the tables are z.005 = -3.29 and z.995 = 3.29
with a middle area of 99% or Δ = .99. Also z.05 = -1.64 and z.95 = 1.64 with a middle area of 90% or
Δ = .90. Great memories, huh? Now returning to the pictured example: If a random z is drawn from this distribution,
the probabilty that a z will fall between -1.96 and +1.96 is .95 or mathematically, P(-1.96 ≤ z ≤ +1.96) = .95.
is computed to be 98.7, we substitute into statements (3):
UL = 98.7 +1.96(16/sq rt(64)) = 98.7 +1.96(2) = 98.7 + 3.92 = 102.62
Page last revised on 1 MAY 2008
RETURN to Home Page of Gary
Ramseyer.
[
Previous 5 Sites
|
Previous
|
Next
|
Next 5 Sites
]
[
Random Site
|
List Sites
]