|
|
Transformations
At the end of the last lab you had to add a constant to
every score in the distribution. If you change every score in the
distribution in the same way, this is called a transformation.
Essentially what you do with a transformation is change the scale of
the distribution. This will typically change some of the properties of
the overall distribution (e.g., center and spread), but within the
distribution all of the points remain in the same location relative to
each other.
Consider the following example:
Suppose that you have a distribution of heights for 10 individuals
measured with a metric ruler. You can transform this distribution into
a different measure, feet for example. So now the mean will change
(it'll be in terms of feet instead of meters), as will the standard
deviation (again it terms of feet rather than meters). However, what
doesn't change are the heights of any of the 10 people in the
distribution. Everybody stays the same height!
One of the most common transformations that is performed
on
distributions is to convert "raw scores" into z-scores.
Z-scores are measured in standard deviation units. This is, the
transformation removes measures like feet or meters, and replaces them
with a unit that can be interpreted as "how many standard deviations
away from the mean is this point." The transformation is performed by
using the z-score formula.
This formula computes the deviation between a score and
the mean of the
distribution and divides it by the average deviation in the
distribution (which is the standard deviation).
Consider the following example:
Suppose that you have recently taken the SAT test and you get a score
of 540. In the information pack that came with your score, it states
that the mean SAT score is m = 500 and with
a standard deviation of s = 100. Suppose
that you would like to convert your score into a z-score (the rest of
the lab will give you some idea why you might want to do this).
z-score = (540-500)/100 = 40/100 = .4
This means that your score is .4 standard deviations above the mean.
Blackboard 1) Answer
the
question about reaction time (Blackboard will generate different
numbers for
each student. Remember to round to 2 decimals, if necessary. Don't
forget the negative sign!)
Blackboard 2) Answer the
question about memory for words.
Converting from z-scores to raw scores is simple. If we multiply both
sides of the z-score formula by σ and add μ to both sides, we get the
following formula:
zσ + μ = X
Thus, if we know that σ = 5, μ = 10, and that z = 2, we would know that
the original score (X) would be : 2 * 5 + 10 = 20
Blackboard 3) Answer the
question about the special ops qualifying test.
You can use SPSS to compute z-scores. Go to the
Descriptive Statistics menu and select descriptives.
Then click on the save as standardized values
box. 
This will create a new column in the dataset that
includes the z-scores for each data point.
Open the students.sav
file again and create z-scores for the Quiz1 variable.
Blackboard 4) What
is
the highest z-score for Quiz1 in the dataset? Hint: Rather than
wasting time looking at each and every score, have SPSS create a
frequency table (Analyze->Descriptives->Frequencies) with the new
variable Zquiz1. You'll find Zquiz1 at the end of the variable list.
Its label in the list will be "Zscore(quiz1)." Round your answer to 2
decimal points.
The Normal Distribution
One of the most commonly occurring distributions is the
Normal
Distribution.
Let's examine the Normal Distribution and see how we
work with probabilities to find the area under the curve for different
ranges of scores. If a distribution is normally distributed
then it is symmetrical and unimodal. A graph of a
normal distribution is shown below.
A few things to note about Normal Distributions.
- Not all unimodal, symmetrical curves are normal, but
a lot are
- For this class, we won't worry about how close a
distribution is to normal, in fact for most of the course we'll assume
that the distribution is normal
- A smooth curve like that above is referred to as a
density curve (rather than a frequency curve)
- The area under (any density) curve must sum to 1.
Why? remember that the area under the curve refers to the probabilities
(or proportions) and the total probability must equal 1.
- The normal distribution is often transformed into
z-scores.
- For a normal distribution:
34.13% of the scores will fall between the mean (m)
and 1 stdev.
13.59% of the scores will fall between 1stdev & 2stdev.
2.28% of the scores will fall between the 2stdev & 3stdev.

This relationship is sometimes referred to as the
68-95-99.7 rule.
In the normal distribution with mean m and
a standard deviation s:
- 68% of the observations fall within s of the mean m.
- 95% of the observations fall within 2s of the mean m.
- 99.7% of the observations fall within 3s of the mean m.
|
 |
What is the probability of having an IQ of 85 or
less?
A more compact way of asking this question uses probability notation
like this:
What is p(IQ < 85)?
for IQ scores m = 100, s
=15
z = (IQ - μ) / σ =
(85 - 100) / 15 = -1
Thus, 85 is -1 standard deviations from the mean.
However, we don't know how much of the figure at
the right is shaded.
In the old days, we had to look up the answer in a large and cumbersome
statistical table. Fortunately, we can use now use Excel (or other
spreadsheet
programs like Corel Quattro or OpenOffic Calc) to get the answer. We
will use
Excel's NORMDIST function.
The NORMDIST
function tells you how much of the normal curve is less than
the value you look up.
Open Excel and in
cell A1 type "=NORMDIST(85,100,15,TRUE)" (without the quotes).
Press the Enter
key.
Cell A1 should now display a value close
to 0.1587. This means that about
15.87% of the population has an IQ of 85 or lower.
In the NORMDIST function, the first value
is the one you want to look up (85
in this case).
The second value is the mean of the population or sample (100, in this case) .
The third is the standard deviation of the population or sample (15, in this case).
The fourth value ("TRUE") tells the function to calculate the
cumulative
proportion rather than the density (i.e., the height or frequency of
the
normal curve).
What if we know the percentile and wish
to find the score associated with it?
For example, what IQ do you need to be in the 75th percentile? To find
a
score associated with a percentage, use the NORMINV function.
In
cell A2 type "=NORMINV(.75,100,15)" (without the quotes)
Cell A2 should now display 110.12 or
something close (I rounded). This means that
you need an IQ of about 110 to be at the 75th percentile.
|
|
Blackboard 5) What is the
probability of
having an IQ score of 78 or less? (Hint: This is not a percentage
question, so don't multiply by 100. Probability has a range from 0 to
1. Round your answer to 2 decimal points.)
Blackboard 6) On a test with a mean of 34 and a standard deviation of
3,
which score falls at the 94th percentile? (Hint: Since this is a
percentile
question, divide 94 by 100 to convert it to a proportion first. Round
your answer to 2 decimal points.)
Although I do want you to be aware of the fact that most
of the statistical
tables we used to have to use can be replaced with Excel, I don't
expect you to become an "Excel Master" in this course. I think that a
little time invested in learning Excel will pay large
dividends in many aspects of your life (not just in statistics).
People who know Excel extremely well
get raises at work because they can be many times more productive than
their peers in a wide variety of tasks. That being said, some of the
questions I would like you to be
able to answer in this course can become unnecessarily complicated
using
a blank Excel sheet to start with. Therefore, because I wish to spare
you needless headaches, I
made an Excel spreadsheet tool that makes this whole
process easier.
Download this file to your datastore (or
somewhere else you can find it later) and open it in Excel.
To use this
spreadsheet:
1. Select "Score to
Proportion" if you know the score(s) and which to calculate proportion
or probabilities. Select "Proportion to Score" if you wish to know a
raw score when you already know the probability or proportion.
2. Select "Less Than",
"More Than", "Between", or "Exclude Between" depending on what you
which to do.
3. Enter the mean in the
dark box at the top left.
4. Enter the standard
deviation in the dark box at the top right.
5. Enter the raw score(s) or proportion(s) that are known in the dark
boxes below the mean and standard deviation boxes. Remember that
proportions MUST range from 0 to 1. Any value outside this range will
result in an error.
Example:
Suppose you wish to know what proportion of scores are less than 5 when
μ = 10 and σ = 3.
1. You know the score (i.e., 5) and you want to know a proportion so
you select "Score to Proportion."
2. You want to know how much of the scores are LESS THAN 5 so select
"Less Than."
3. Enter 10 as the mean
4. Enter 3 as the standard deviation.
5. Enter 5 as the raw score.
You should now see the answer (.05) in the "Proportion Under Curve" box.
Blackboard 7) What
proportion of scores are less than 50 when μ = 50 and σ = 10?
Blackboard 8)
What
proportion of scores are more than 60 when μ = 50 and σ = 10? Hint:
Select "More than" instead of "Less than" in the listbox.
Blackboard 9) Approximately which score is in the top 25% of scores
(i.e.,
higher than 75% of scores) in a distribution in which μ = 100 and σ = 10? Hint: Select
"Proportion to Score." Select "More Than" and enter .75 in the
cumulative proportion box. The answer
will appear in the "Raw Score" box. Round answer to 2 decimals.
Sometimes we need to find the probability that X will
fall between two scores rather than simply above a score or below a
score.
The spreadsheet tool looks up the cumulative proportions
for both z-scores and computes the difference between them.
Example:
What is the
proportion of the population scores between 22 and 28
on the ACT?
Assume
that for the ACT: μ = 21, σ = 5
Before
computers did this task for us, we used to have to calculate
the
z-scores, look up the cumulative proportions associated with
those z-scores in a table, and then subtract the
difference.The
process used to look like this:
The
z-score of 22 is (22 - 21) / 5 = .20
The z-score of 28 is (28 - 21) / 5 = 1.40
The cumulative proportion associated with a z-score of .20 is .5793 (value obtained from a table like this one). The cumulative proportion associated with a z-score of 1.20 is .9192. The difference between these proportions is .9193 - .5793 = .3400 (with a little rounding error).
To do all this with the spreadsheet is simple: 1. Make sure that the "Score to Proportion" option is selected. 2. Make sure that the "Between" option is selected. 3. Enter 21 as the mean in the dark box at the top. 4. Enter 5 as the standard deviation. The box labeled "Proportion Under Curve" should now say .34, which is the same answer obtained above.
|
|
Blackboard 10) If μ =
25
and
σ = 10, what proportion is between 34 and 41? (Hint:
This is not a percentage
question, so don't multiply by 100. Proportion has a range from 0 to
1. Round your answer to 2 decimal points.)
And finally, you might want to know what proportion lies
outside two points (essentially the opposite of the
last situation). If you did not have the spreadsheet tool, you would
solve the problem just like you did with the between type of question
but then you would subtract your answer from 1. Thus, in the previous
example, 34% of the ACT scores were between 22 and 28. This means that
100% - 34% = 66% of the scores fell outside this range.
Example:
| What is the prob. of scoring lower than 300 or
higher than 650 on the SAT?
Assume: m = 500, s =100
p(z > (650 - 500))= p(z < 1.5) = .9332 100
p(z < (300 - 500))= p(z < -2.0) = .0228 100
The difference between these numbers is .9104. This is the area between
the scores. Subtracting .9104 from 1 gives us the area outside the
scores.
Thus, 1 - .9104 = .0896
Rounding gives us .09.
To answer the problem with the spreadsheet tool, select "Scores to
Proportions", select "Exclude Between", enter 500 and 100 as the mean
and standard
deviation, and enter 300 and 650 as the 2 raw scores.
|
|
Blackboard 11) If μ =
47
and
σ = 6, what proportion is outside of 40 and 44?
Comparing Distributions
Consider the following example:
|
The distribution is of SAT scores.
The mean (m) is 500.
The standard deviation (s) is 100.
If you got a score of 650 on the SAT, what is
the corresponding z-score?
(X - )
/ s = (650 - 500) / 100 = 150 / 100 = 1.5
So your score (650) is 1.5 standard deviations
above the mean.
|
Now let's think about why making the transformation into z-scores is
important. Suppose that we want to compare two scores from two
distributions. If these distributions each have a different mean and
standard deviation, then this task can be difficult. However, if we
transform each distribution into z-scores (standardize the
distribution,
like SPSS did in the earlier section), then we can compare the
distributions more easily.
Consider the following situation. You take the ACT
test and the SAT test. You get a 26 on the ACT and a 620 on the SAT.
The college that you apply to only needs one score. Which do you want
to send them (that is, which score is better, 26 or 620?).
It is hard to do a direct comparison here because the
two distributions
have different properties: different means, and different
variabilities.
How might we go about it?
Step 1: look at the distribution graphs, locate the scores and compare
-- still hard to tell
Step 2: think about cumulative percentiles and percentile ranks -- this
will work
Step 3: try and take the deviations and standard deviations into
account by converting all the scores to z-scores
e.g., ACT mean = 18, SD = 6, deviation = 26 - 18 = 8
so an 8 is 1.33 SD above the mean (8 / 6)
SAT mean = 500, SD = 100, deviation = 620-500=120
so a 620 is 1.2 SD above the mean (120 / 100)
- so the ACT score is better than the SAT score
So to be able to make a comparison, one approach would
be to transform both distributions into a standardized distribution.
We can transform any and all observations or values
from a distribution to a z-score if we know either the m & s, or the & s.
Blackboard 12) Suppose that you
got a
540
on the SAT (μ = 500, σ = 100) and a 20 on the ACT (μ = 18, σ = 6).
Which score is better?
Blackboard 13) Suppose instead
that
you
got a 600 on the SAT (μ = 500, σ = 100) and a 24 on the ACT (μ =
18, σ = 6). Now which score is better?
|