Lab 2
Psychology 138

Creating a data file with SPSS

SPSS is one of many software packages that are useful for data analysis. SPSS is, by far, the most popular program among psychologists. We will use Version 16.0. I started with version 6.0 in 1996. The program is considerably better now and continues to improve but the basics are the same and are likely to remain so for a long time.

For whatever reason, approximately 75% of my students think that saying "SPSS" is a mouthful and shorten the name to "SPS" (or SSP, PSP, SSPS, SP, and many other variants). Be among the 25% to get it right and I will think that you are a very bright and conscientious student with a special flair for statistics!

Before you can learn to read, you need to work on your ABC's. Before you can learn to analyze data, you need to learn to create a data file. "Data file" refers to a computer file with organized data (BTW, The word "data" is plural for "datum" and thus it is proper to use the word like this: "The data are inconclusive." or "There are no data to support your conclusion." Fuddy-duddies like me talk like this but recently usage notes in dictionaries have begun to allow sentences like "The data is inconclusive. " and "There is no data to support your conclusion." I am not bothered by this usage but many of my fellow fuddy-duddies will think you uninformed and possibly unintelligent if you talk like that. So if you want to avoid embarrassment in professional settings, stick to the traditional rule of using "data" as a plural noun. Of course, then you'll sound like a fuddy-duddy. Thus, it is safer to circumvent the problem by saying things like, "The results are inconclusive, given the data we have." or "I haven't seen any data that would support your conclusion." Better yet, maintain your smart and cool image by avoiding the word entirely by saying, "I'm not sure if we can conclude anything yet." or "I don't think you're right. Do you have any evidence?").

Go ahead and open SPSS. There should be a shortcut icon for SPSS on your desktop. Sometimes it is slow to load (it is a big, powerful program). Click "Cancel" (or "Type in data") if you see this:



You should see an empty datafile that looks like this:


Click on the "Variable View" tab at the bottom left (I circled it in red on the image above.). Now your screen should look like this:


You are now ready to create a dataset.

Naming your variables
Let's say we have data from participants in a clinical trial for a new medical treatment. We have information about the person's name, sex, age, and annual income. In most datasets, each person has a unique identification number of some sort (e.g., social security number). This prevents problems that arise when people have the exact same name. Let's call the first variable ParticipantID. Go ahead and type "ParticipantID" (without the quotes) on the first row in the "Name" column, change the "Decimals" value from 2 to 0, and type "Participant ID" in the "Label" column. It should now look like this:



The ParticipantID variable will be a numeric variable (i.e., a variable that holds numbers instead of text, dates, or other kinds of values). I had you set the "Decimals" column to 0 so that values would be displayed as integers rather than more precise values like 2.14. The "Label" column helps you describe what the variable is and it allows characters that the "Name" column won't. Labels can be up to 255 characters long. I recommend labeling all variables. It takes extra time at first but will save you time during data analysis because you won't have to figure out what your variables are each time.

Here are some restrictions on variable names (Reading these might save you future heartache but there is no need to memorize them):
1. Variable names typically must begin with a letter (there are some exceptions to this rule but that topic is more advanced than anything you'll need for this class).
2. Variable names can have uppercase letters, lowercase letters, numbers, and these characters: _ . $ # @. For example, A._$@#1 is a valid variable name (this might be a good name for a variable that contains a curse word!).
3. Variable names cannot have spaces.
4. Variable names cannot have names longer than 64 characters. In the old days, variable names could not be longer than 8 characters. Such are the fruits of progress!
4. The following cannot be variable names: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH. However any change at all to these words can be used as a variable name. For example "all" is impossible but "all1" is perfectly fine.
5. It is possible but not recommended to have variable names that end in a period or an underscore: _ . This can cause problems with syntax ("Syntax" refers to an SPSS-specific programming language that many people, including me, use to run SPSS in a more efficient manner. We won't be using syntax in this class, though.).
Suggestions about variable names
1. The variable name should be a description of the variable, if possible. If the variable can't be succinctly described in the name, be sure to describe it in the "Label" column. I promise you that otherwise you will forget what the variable is if you put the dataset away and then return to it months or years later. If you have longitudinal study of couples, one of your variables might be "DepressionWifeTime1" and you can label the variable "Wife's level of depression at the beginning of the study".
2. Periods and underscores are useful to show that some variables are grouped together. for example, if you measure the height of children twice a year for 3 years starting in 2007, you could name the variables like this:
height.2007.1
height.2007.2
height.2008.1
height.2008.2
height.2009.1
height.2009.2



Variable Types:
The 4 variable types used in this course will be
1. Numeric. A variable whose values are numbers and are displayed in standard format.
2. String. A variable whose values are text. Uppercase and lowercase letters are considered distinct. String variables are also known as alphanumeric variables.
3. Date. A numeric variable whose values are displayed as a date. There are many different date formats available.
4. Dollar. A numeric variable displayed with a leading dollar sign ($). When entering data, you don't need to type the dollar sign.
There are other variable types, including scientific notation, custom currency, comma, and dot. Check the Help menu in SPSS for explanations if you think you might need them (you won't in this course).

Let's make another number variable. On line 2, enter "Age" in the "Name" column. In the "Label" column enter "Participant Age (in years)". For adults, we would probably enter age as an integer but we would want more precision for children, especially infants. So 2.50 would mean that the person is 2 and a half years old. Let's leave the "Decimals" as 2, which is the default for all numeric variables.

Let's make 2 string variables. On line 3, enter "LastName" and on line 4 enter "FirstName". These contain the last and first names of the participant in the study. By default, new variables are assumed to be numeric. Change the variable type of LastName by selecting the cell in the "Type" column. Click the gray box with the 3 dots that appears on the right side of the box. Now select String and enter 30 in the "Characters" box. The "Characters" box specifies how many characters can fit in the variable. If you have a name with more than 30 characters, you would need to enter a higher number. Repeat this process for the FirstName variable.

Let's make a dollar variable. One line 5, enter "Income". Change the variable type to "Dollar". The display can be formatted in several ways but you don't need to select any of the options.

Participant gender could be coded as a string variable and you would just enter "Male" and "Female" for each person. This would be okay for small datasets but it is generally a better idea to have a code number for male and another code number for female. This saves time during data entry, makes the size of the file smaller, and makes the analyses run faster. Most researchers use either 1 and 2 or 0 and 1 for categorical variables like gender. However, this is merely a convention. You could choose anything you like when you create your own research data. What is important is that you remember which number corresponds to which sex. You do this by using the "Values" column.

Enter "Sex" on line 6. Change the "Decimals" to 0. Enter "Participant Sex" in the "Label" column. Select the "Values" cell on line 6. Click the gray box on the right side of the cell. Enter 1 in the "Value" box. Enter "Male" in the "Label" box. Click Add. Enter 2 in the "Value" box. Enter "Female" in the "Label" box. Click Add. Click Okay.

This is what you should have so far. Add in the missing labels as shown below.



If all I had were these variables, I probably wouldn't have written "Participant" in each of the labels because it is obvious what the variables are. However, in some datasets it is good to specify. For example, if you were to also measure the age of the participants' spouses and children, you wouldn't want there to be any ambiguity in any of your printouts. In general, it is better to be too detailed than to be not detailed enough.

Saving your dataset
1. Save your dataset often. I have had heartbreaking events happen because I failed to save my data often. I've had power outages, software crashes, computer failures, roommate interference, pet interference, and my own general stupidity cause me to have re-do hours of work. Save often. A lot. Frequently. Really. I'm not kidding. Hitting the ctrl-S shortcut key is a quick and easy way to save data in SPSS (and most other programs).
2. It is a good idea to put the date in the name of the file. In this course, your data will be neat and clean. In real data analysis, you often have multiple copies of similar datasets so it is helpful to know which is the most recent one. Note that dates cannot have slashes (e.g., 1/16/2008) in file names because Windows interprets slashes as folders.
3. Name your dataset something descriptive rather than "Data" or something like that. If the study is about the effect of journaling on stress related illnesses, call it "Journaling and Stress 1-16-2008".

Click File. Click Save (or click the Save button or press the ctrl-S shortcut keys). Name your dataset "Lab 2" followed by your section number, your last name, and today's date. If your name is Jones and you are in section 2, save the file as "Lab 2 Section 2 Jones 1-16-2007". This will help your GA know whose file is whose. Note section meeting times:
12:00 Section 1
1:00 Section 2
2:00 Section 3
3:00 Section 4

Enter your data.
With real data analysis, datasets are often very large. We will start small with only 5 people.
Click the "Data View" tab at the bottom. In the "Variable View" page, each row is a variable. In the "Data View" page, each row is a person and each column is a variable. This is a little tricky at first but you'll get used to it soon.

Here are the data:
Participant 1: Franz Ardle, Age 34, makes $48,000 per year, male
Participant 2: Julie Barnes, Age 50, makes $79,000 per year, female
Participant 3: Maria Chamorro, Age 22, makes $26,000 per year, female
Participant 4: Wynona David, Age 18, makes $5,600 per year, female
Participant 5: Zachary Franco, Age 41, makes $40,000 per year, male

Remember that "male" and "female" are entered as 1 and 2. Your data should look like this:


Save your file again and email it as an attachment to your GA.


Put "Lab 2", your name, and section number in the subject line. If you are in Section 4 and your name is Fred Jones, your subject line should be "Lab 2 Fred Jones Section 4".

You've made your first dataset in SPSS! Congratulations!