SPSS is one of many software packages that are useful for data analysis. SPSS
is, by far, the most popular program among psychologists. We will use Version
16.0. I started with version 6.0 in 1996. The program is considerably better now
and continues to improve but the basics are the same and are likely to remain so
for a long time.
For whatever reason, approximately 75% of my students think that saying "SPSS"
is a mouthful and shorten the name to "SPS" (or SSP, PSP, SSPS, SP, and many
other variants). Be among the 25% to get it right and I will think that you are
a very bright and conscientious student with a special flair for statistics!
Before you can learn to read, you need to work on your ABC's. Before you can
learn to analyze data, you need to learn to create a data file. "Data file"
refers to a computer file with organized data (BTW, The word "data" is plural
for "datum" and thus it is proper to use the word like this: "The data are
inconclusive." or "There are no data to support your conclusion." Fuddy-duddies
like me talk like this but recently usage notes in dictionaries have begun to
allow sentences like "The data is inconclusive. " and "There is no data to
support your conclusion." I am not bothered by this usage but many of my fellow
fuddy-duddies will think you uninformed and possibly unintelligent if you talk
like that. So if you want to avoid embarrassment in professional settings, stick
to the traditional rule of using "data" as a plural noun. Of course, then you'll
sound like a fuddy-duddy. Thus, it is safer to circumvent the problem by saying
things like, "The results are inconclusive, given the data we have." or "I
haven't seen any data that would support your conclusion." Better yet, maintain
your smart and cool image by avoiding the word entirely by saying, "I'm not sure
if we can conclude anything yet." or "I don't think you're right. Do you have
any evidence?").
Go ahead and open SPSS. There should be a shortcut icon for SPSS on your
desktop. Sometimes it is slow to load (it is a big, powerful program). Click
"Cancel" (or "Type in data") if you see this:
You should see an empty datafile that looks like this:
Click on the "Variable View" tab at the bottom left (I circled it in red on the
image above.). Now your screen should look like this:
You are now ready to create a dataset.
Naming your variables
Let's say we have data from participants in a clinical trial for a new medical
treatment. We have information about the person's name, sex, age, and annual
income. In most datasets, each person has a unique identification number of some
sort (e.g., social security number). This prevents problems that arise when
people have the exact same name. Let's call the first variable ParticipantID.
Go ahead and type "ParticipantID" (without
the quotes) on the first row in the "Name" column, change the "Decimals" value
from 2 to 0, and type "Participant ID" in the "Label" column. It should
now look like this:
The ParticipantID variable will be a numeric variable (i.e., a variable that
holds numbers instead of text, dates, or other kinds of values). I had you set
the "Decimals" column to 0 so that values would be displayed as integers rather
than more precise values like 2.14. The "Label" column helps you describe what
the variable is and it allows characters that the "Name" column won't. Labels
can be up to 255 characters long. I recommend labeling all variables. It takes
extra time at first but will save you time during data analysis because you
won't have to figure out what your variables are each time.
Here are some restrictions on variable names
(Reading these might save you future heartache but there is no need to memorize
them):
1. Variable names typically must begin with a letter (there are some exceptions
to this rule but that topic is more advanced than anything you'll need for this
class).
2. Variable names can have uppercase letters, lowercase letters, numbers, and
these characters: _ . $ # @. For example, A._$@#1 is a valid variable name (this
might be a good name for a variable that contains a curse word!).
3. Variable names cannot have spaces.
4. Variable names cannot have names longer than 64 characters. In the old days,
variable names could not be longer than 8 characters. Such are the fruits of
progress!
4. The following cannot be variable names: ALL, AND, BY, EQ, GE, GT, LE, LT, NE,
NOT, OR, TO, WITH. However any change at all to these words can be used as a
variable name. For example "all" is impossible but "all1" is perfectly fine.
5. It is possible but not recommended to have variable names that end in a
period or an underscore: _ . This can cause problems with syntax ("Syntax"
refers to an SPSS-specific programming language that many people, including me,
use to run SPSS in a more efficient manner. We won't be using syntax in this
class, though.).
Suggestions about variable names
1. The variable name should be a description of the variable, if possible. If
the variable can't be succinctly described in the name, be sure to describe it
in the "Label" column. I promise you that otherwise you will forget what the
variable is if you put the dataset away and then return to it months or years
later. If you have longitudinal study of couples, one of your variables might be
"DepressionWifeTime1" and you can label the variable "Wife's level of depression
at the beginning of the study".
2. Periods and underscores are useful to show that some variables are grouped
together. for example, if you measure the height of children twice a year for 3
years starting in 2007, you could name the variables like this:
height.2007.1
height.2007.2
height.2008.1
height.2008.2
height.2009.1
height.2009.2
Variable Types:
The 4 variable types used in this course will be
1. Numeric. A variable whose values are numbers and are displayed in standard
format.
2. String. A variable whose values are text. Uppercase and lowercase letters are
considered distinct. String variables are also known as alphanumeric variables.
3. Date. A numeric variable whose values are displayed as a date. There are many
different date formats available.
4. Dollar. A numeric variable displayed with a leading dollar sign ($). When
entering data, you don't need to type the dollar sign.
There are other variable types, including scientific notation, custom currency,
comma, and dot. Check the Help menu in SPSS for explanations if you think you
might need them (you won't in this course).
Let's make another number variable.
On line
2, enter "Age" in the "Name" column. In the "Label" column enter "Participant
Age (in years)". For adults, we would probably enter age as an integer
but we would want more precision for children, especially infants. So 2.50 would
mean that the person is 2 and a half years old. Let's leave the "Decimals" as 2,
which is the default for all numeric variables.
Let's make 2 string variables.
On line 3,
enter "LastName" and on line 4 enter "FirstName". These contain the last
and first names of the participant in the study. By default, new variables are
assumed to be numeric.
Change the variable
type of LastName by selecting the cell in the "Type" column. Click the gray box
with the 3 dots that appears on the right side of the box. Now select String and
enter 30 in the "Characters" box. The "Characters" box specifies how many
characters can fit in the variable. If you have a name with more than 30
characters, you would need to enter a higher number.
Repeat this process for the FirstName
variable.
Let's make a dollar variable.
One line 5,
enter "Income". Change the variable type to "Dollar". The display can be
formatted in several ways but you don't need to select any of the options.
Participant gender could be coded as a string variable and you would just enter
"Male" and "Female" for each person. This would be okay for small datasets but
it is generally a better idea to have a code number for male and another code
number for female. This saves time during data entry, makes the size of the file
smaller, and makes the analyses run faster. Most researchers use either 1 and 2
or 0 and 1 for categorical variables like gender. However, this is merely a
convention. You could choose anything you like when you create your own research
data. What is important is that you remember which number corresponds to which
sex. You do this by using the "Values" column.
Enter "Sex" on line 6. Change the
"Decimals" to 0. Enter "Participant Sex" in the "Label" column. Select the
"Values" cell on line 6. Click the gray box on the right side of the cell. Enter
1 in the "Value" box. Enter "Male" in the "Label" box. Click Add. Enter 2 in the
"Value" box. Enter "Female" in the "Label" box. Click Add. Click Okay.
This is what you should have so far.
Add in
the missing labels as shown below.
If all I had were these variables, I probably wouldn't have written
"Participant" in each of the labels because it is obvious what the variables
are. However, in some datasets it is good to specify. For example, if you were
to also measure the age of the participants' spouses and children, you wouldn't
want there to be any ambiguity in any of your printouts. In general, it is
better to be too detailed than to be not detailed enough.
Saving your dataset
1. Save your dataset often. I have had heartbreaking events happen because I
failed to save my data often. I've had power outages, software crashes, computer
failures, roommate interference, pet interference, and my own general stupidity
cause me to have re-do hours of work. Save often. A lot. Frequently. Really. I'm
not kidding. Hitting the ctrl-S shortcut key is a quick and easy way to save
data in SPSS (and most other programs).
2. It is a good idea to put the date in the name of the file. In this course,
your data will be neat and clean. In real data analysis, you often have multiple
copies of similar datasets so it is helpful to know which is the most recent
one. Note that dates cannot have slashes (e.g., 1/16/2008) in file names because
Windows interprets slashes as folders.
3. Name your dataset something descriptive rather than "Data" or something like
that. If the study is about the effect of journaling on stress related
illnesses, call it "Journaling and Stress 1-16-2008".
Click File. Click Save (or click the Save
button or press the ctrl-S shortcut keys). Name your dataset "Lab 2" followed by
your section number, your last name, and today's date. If your name is
Jones and you are in section 2, save the file as "Lab 2 Section 2 Jones
1-16-2007". This will help your GA know whose file is whose. Note section
meeting times:
12:00 Section 1
1:00 Section 2
2:00 Section 3
3:00 Section 4
Enter your data.
With real data analysis, datasets are often very large. We will start
small with only 5 people.
Click the "Data View" tab at the bottom.
In the "Variable View" page, each row is a variable. In the "Data View" page,
each row is a person and each column is a variable. This is a little tricky at
first but you'll get used to it soon.
Here are the data:
Participant 1: Franz Ardle, Age 34, makes $48,000 per year, male
Participant 2: Julie Barnes, Age 50, makes $79,000 per year, female
Participant 3: Maria Chamorro, Age 22, makes $26,000 per year, female
Participant 4: Wynona David, Age 18, makes $5,600 per year, female
Participant 5: Zachary Franco, Age 41, makes $40,000 per year, male
Remember that "male" and "female" are entered as 1 and 2. Your data should look
like this:
Save your file again and email it as an
attachment to your GA.
Put "Lab 2", your name, and section number
in the subject line. If you are in Section 4 and your name is Fred Jones,
your subject line should be "Lab 2 Fred Jones Section 4".
You've made your first dataset in SPSS! Congratulations!