Dear King,
I said I'd send you a problem I'd recently found difficult. As is common, the difficulty wasn't technical; it was quite easy, technically, to get the desired result. The difficulty was layers of misunderstanding about what was really wanted, and what it meant.
Behold, a thread. You'll see that, as is common for long and difficult ones, it starts on-list and goes off.
Richard
Date: Thu, 10 Nov 2005 11:21:44 -0500
Subject: Person Mean Substitution
To: SPSSX-L@LISTSERV.UGA.EDU
Hi all,
I want to use Person Mean Substitution for dealing with my missing data. I have searched the archives here and have found some really useful stuff. I have written some syntax which instantly creates my scale score by inserting the Person Mean where appropriate. However, I would like some syntax which just inserts the individual Person Mean for all missing data, so I can then run some analyses on my item-level data, prior to computing scale scores. I found something in the archives here which explained how to do this, but it didn't seem to work for me.
I have an extremely large data set with up to 45 subscales I will need to use the person mean for. Can anyone please provide me with some syntax or advise on an a method for simply inserting the Person Mean for scales which have missing data? I hope someone can advise. Any help would be very much appreciated. :-)
Many thanks.
Comment: This solves the problem as presented; I added one comment warning about the appropriateness of the technique.
At 11:21 AM 11/10/2005, Anonymous wrote:
>I want to use Person Mean Substitution for dealing with my missing
>data.
First, caution: this post is about how to do the calculations, not about whether the technique is statistically appropriate. About the latter, I don't know.
>I have written some syntax which instantly creates my scale score by
>inserting the Person Mean where appropriate.
I don't know what you came up with, but if the items for scale 1 are SCL01Q01 through, say, SCL01Q22, simple syntax to do that is
COMPUTE SCL01 = 22*MEAN(SCL01Q01 TO SCL01Q22).
>However, I would like some syntax which just inserts the individual
>Person Mean for all missing data, so I can then run some analyses on
>my item-level data.
Well, this is untested, but try
COMPUTE #PRSN_MN = MEAN(SCL01Q01 TO SCL01Q22).
DO REPEAT ITEM = SCL01Q01 TO SCL01Q22.
.IF MISSING (ITEM) ITEM = #PRSN_MN.
END REPEAT.
Notes:
A. "#PRSN_MN" is a scratch variable, so it won't be in the final output.
B. This uses DO REPEAT logic, which is the easiest to write. You could also declare a SCL01Q01 TO SCL01Q22 and use a LOOP.
And here we go off-list, and the plot thickens. I had my pride pricked by being told that my "code ... hasn't worked," especially when I thought it was right (as, it turned out, it was):
Hi Richard,
Many thanks for your e-mail. I appreciate the help. I've tried using the code you provided but it hasn't worked. I suspect that it may be something that I am doing wrong as I am new to usng syntax. I've tried changing the code in various ways but still the syntax won't run.
I've actually now managed to find a way to do it in excel. It's a bit of a pain but it works.
Thanks again for your help Richard! I really appreciate it!!
Best wishes
Still off-list, as it stayed. Notice my pride being pricked, though I think I was polite. ESPECIALLY at the suggestion that Excel would do it better, which seemed most unlikely to me. It seemed all wrong to abandon SPSS and go to an Excel solution, when fixing the SPSS would almost certainly be much easier.
From: Richard Ristow <wrristow@mindspring.com>
At 09:30 AM 11/16/2005, you wrote:
>Many thanks for your e-mail. I appreciate the help. I've tried using
>the code you provided but it hasn't worked. I suspect that it may be
>something that I am doing wrong as I am new to usng syntax. I've tried
>changing the code in various ways but still the syntax won't run.
>
>I've actually now managed to find a way to do it in excel. It's a bit
>of a pain but it works.
Of course it bothers me when my code doesn't work. And it sounds like, since it is "a bit of a pain", it would be worth getting it working in SPSS.
If you send me some test data, I'll try it. (By the way, I'll be out of action for a week starting this next Monday.)
-Onward, Richard
So, this is what she responded. I'm including the attached files she sent with this note. The syntax is simple to the point of being brainless, or at least poorly thought out; the dataset is very wide (58 variables or so,) and also poorly thought out. I started to get annoyed, I suppose at having my code criticized by somebody who would do THIS
Hi again Richard,
Using excel is OK for small questionnaires with missing data, but for large questionnaires it's a really time consuming & tedious process.
It would be great if you could look at the SPSS code again for me. As I said, i'm a novice with syntax. I've only just started to learn how to use it and can only do simple things at the moment.
I've attached some sample data from one of my questionnaires which has missing data. The data file contains a 50 item questionnaire with 4 cases who have missing data on particular items. The 5 columns at the end of the data file are the summed scale scores for that questionnaire. I've attached included some syntax which scores this questionnaire.
The problem I have is that I need to insert the person mean per subscale, not just the person mean of the 50 items on the questionnaire. So in the sample data i've given you, there is missing data on 3 of the questionnaires 5 subscales (ipip_op, ipip_co and ipip_ne). So if we look at User ID 914 we can see that the missing data on items 40 and 43, means that I need to compute the person mean based on the ipip_op (Q40) and ipip_co (Q43) subscales. So, I want some code that can simply insert the person mean at the item level based on a combination of certain items from that questionnaire.
In excel i've been having to individually cut and paste each column of dat e.g., item1, item 6,item 15, etc, into the excel file until I have all the items which make up a particular subscale. I then compute the person mean, As I said, with my larger questionnaires of 160 items and 12 subscales, this is a tedious process.
I'll look forward to hearing from you Richard.
Thanks again!
And here, you see, I started to get genuinely annoyed, an annoyance I expressed partly by writing quite a bit of code, solving the problem from several points of view:
From: Richard Ristow <wrristow@mindspring.com>
At 06:21 AM 11/17/2005, you wrote:
>It would be great if you could look at the SPSS code again for me.
>
>I've attached some sample data from one of my questionnaires which has
>missing data. The data file contains a 50 item questionnaire with 4
>cases who have missing data on particular items.
And you have five scales, where the items are, alas, not contiguous.
>The 5 columns at the end of the data file are the summed scale scores
>for that questionnaire. I've attached included some syntax which
>scores this questionnaire.
And your syntax (I've reformatted it for readability) is
COMPUTE IPIP_EX = q1ipip + q6ipip + q11ipip + q16ipip + q21ipip q26ipip + q31ipip + q36ipip + q41ipip + q46ipip.
COMPUTE IPIP_AG = q2ipip + q7ipip + q12ipip + q17ipip + q22ipip q27ipip + q32ipip + q37ipip + q42ipip + q47ipip.
COMPUTE IPIP_CO = q3ipip + q8ipip + q13ipip + q18ipip + q23ipip q28ipip + q33ipip + q38ipip + q43ipip + q48ipip.
COMPUTE IPIP_NE = q4ipip + q9ipip + q14ipip + q19ipip + q24ipip q29ipip + q34ipip + q39ipip + q44ipip + q49ipip.
COMPUTE IPIP_OP = q5ipip + q10ipip + q15ipip + q20ipip + q25ipip q30ipip + q35ipip + q40ipip + q45ipip + q50ipip.
And you've then converted missing scale scores to -99. You don't say how, but this is simple code for the purpose:
RECODE IPIP_EX TO IPIP_OP (MISSING=-99).
MISSING VALUES IPIP_EX TO IPIP_OP (-99).
>The problem I have is that I need to insert the person mean per
>subscale, not just the person mean of the 50 items on the
>questionnaire. So in the sample data i've given you, there is missing
>data on 3 of the questionnaires 5 subscales (ipip_op, ipip_co and
>ipip_ne). So if we look at User ID 914 we can see that the missing
>data on items 40 and 43, means that I need to compute the person mean
>based on the ipip_op (Q40) and ipip_co (Q43) subscales.
I've REALLY lost track of what you're saying, here. First, 'subscale': that's the five, IPOP_EX to IPOP_OP?
>if we look at User ID 914 we can see that the missing data on items 40
>and 43, means that I need to compute the person mean based on the
>ipip_op (Q40) and ipip_co (Q43) subscales.
OK: WHAT person mean? Mean of whatever values you have on the five subscales?
Then, in your calculation of the subscales, the subscale score is missing if any item going into it is missing. I'm not at all sure whether you want this or not. You write,
>I want some code that can simply insert the person mean at the item
>level
"Person mean at the item level" means "mean value of a set of items" for that person?? And insert it where? Into the calculation of the five subscales?
>In excel i've been having to individually cut and paste each column of
>dat e.g., item1, item 6,item 15, etc, into the excel file until I have
>all the items which make up a particular subscale. I then compute the
>person mean.
Now: HOW do you compute the person mean? The mean of the columns? If so, what happens to the "-99" values? you can't want those included in the mean calculation. And, when you have the person mean, what do you do with it? Do you hand-enter it into your SPSS file, or what?
You see that I'm badly confused.
MAYBE you want to compute the subscales using the person mean, for that subscale, for the missing items. MAYBE you want to compute an overall mean from the valid values of the subscales.
Here's a try. Since I'm confused, I'm trying several things several ways, which may make YOU confused. So ask again, if you need to.
* Original
* "I want to use Person Mean Substitution for dealing with .
* my missing data. I have some syntax which creates my scale.
* score by inserting the Person Mean where appropriate. .
* However, I would like some syntax which just inserts the .
* individual Person Mean for all missing data. .
* Follow-up
* Clicked-up and edited syntax to open file and do some .
* cosmetic clean-ups .
* This loads your file. The "/KEEP =" list puts the subscale.
* scores near the beginning, instead of at the end.
GET FILE = "C:\A_WRR\E-mail\WRR_Desk\Attachments\test PMS.sav"
/KEEP = USERID ipip_ex TO ipip_op ALL.
* These are some odds and ends to make the displayed file .
* more compact and readable.
VARIABLE WIDTH q1ipip TO q50ipip (5) ipip_ex TO ipip_op (6) var00001 (7) .
FORMATS q1ipip TO q50ipip ipip_ex TO ipip_op var00001 (F4).
VARIABLE WIDTH userid (6) .
FORMATS USERID (F6).
* This is how you're calculating the subscales .
COMPUTE IPIP_EX = q1ipip + q6ipip + q11ipip + q16ipip + q21ipip q26ipip + q31ipip + q36ipip + q41ipip + q46ipip.
COMPUTE IPIP_AG = q2ipip + q7ipip + q12ipip + q17ipip + q22ipip q27ipip + q32ipip + q37ipip + q42ipip + q47ipip.
COMPUTE IPIP_CO = q3ipip + q8ipip + q13ipip + q18ipip + q23ipip q28ipip + q33ipip + q38ipip + q43ipip + q48ipip.
COMPUTE IPIP_NE = q4ipip + q9ipip + q14ipip + q19ipip + q24ipip q29ipip + q34ipip + q39ipip + q44ipip + q49ipip.
COMPUTE IPIP_OP = q5ipip + q10ipip + q15ipip + q20ipip + q25ipip q30ipip + q35ipip + q40ipip + q45ipip + q50ipip.
* This makes the subscales a little easier to read .
FORMATS IPIP_EX TO IPIP_OP (F5).f
* You seem to do something like this for your missing data .
RECODE IPIP_EX TO IPIP_OP (MISSING=-99). MISSING VALUES IPIP_EX TO IPIP_OP (-99).
* And this is what you get .
LIST USERID IPIP_EX TO IPIP_OP.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:48 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP
781 37 38 -99 -99 36 914 38 38 -99 34 -99 1238 36 46 36 -99 43 1254 34 46 -99 -99 -99
Number of cases read: 4 Number of cases listed: 4
* Here's the mean of the non-missing values of the subscales .
COMPUTE OVRL_MN = MEAN(IPIP_EX TO IPIP_OP).
FORMATS OVRL_MN (F6.2).
* And this is what you get .
LIST USERID IPIP_EX TO IPIP_OP OVRL_MN.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:48 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP OVRL_MN
781 37 38 -99 -99 36 37.00 914 38 38 -99 34 -99 36.67 1238 36 46 36 -99 43 40.25 1254 34 46 -99 -99 -99 40.00
Number of cases read: 4 Number of cases listed: 4
* Now, get rid of the old values of the subscales: .
COMPUTE IPIP_EX = $SYSMIS.
COMPUTE IPIP_AG = $SYSMIS.
COMPUTE IPIP_CO = $SYSMIS.
COMPUTE IPIP_NE = $SYSMIS.
COMPUTE IPIP_OP = $SYSMIS.
* ............................................................. .
* If you want to compute the subscales as if the mean of the .
* non-missing items in that subscale is substituted for the .
* missing items, here's how: .
* First, calculate the subscales as the MEAN (not sum) that you .
* get if you ignore missing values. That's the same as you get .
* if you substitute the mean of the non-missing values for the .
* missing values .
COMPUTE IPIP_EX = MEAN(q1ipip, q6ipip, q11ipip, q16ipip, q21ipip, q26ipip, q31ipip, q36ipip, q41ipip, q46ipip).
COMPUTE IPIP_AG = MEAN(q2ipip , q7ipip , q12ipip, q17ipip, q22ipip, q27ipip, q32ipip, q37ipip, q42ipip, q47ipip).
COMPUTE IPIP_CO = MEAN(q3ipip , q8ipip , q13ipip, q18ipip, q23ipip, q28ipip, q33ipip, q38ipip, q43ipip, q48ipip).
COMPUTE IPIP_NE = MEAN(q4ipip , q9ipip , q14ipip, q19ipip, q24ipip, q29ipip, q34ipip, q39ipip, q44ipip, q49ipip).
COMPUTE IPIP_OP = MEAN(q5ipip , q10ipip, q15ipip, q20ipip, q25ipip, q30ipip, q35ipip, q40ipip, q45ipip, q50ipip).
* Format so you can see the decimals .
FORMATS IPIP_EX TO IPIP_OP (F5.2).
* And this is what you have .
LIST USERID IPIP_EX TO IPIP_OP.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:50 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP
781 3.70 3.80 4.00 2.33 3.60 914 3.80 3.80 4.00 3.40 4.56 1238 3.60 4.60 3.60 3.44 4.30 1254 3.40 4.60 3.33 4.33 3.89
Number of cases read: 4 Number of cases listed: 4
* To change from average to sum, multiply by the number of items.
* in the subscale, which is 10 in each case. .
COMPUTE IPIP_EX = 10*IPIP_EX.
COMPUTE IPIP_AG = 10*IPIP_AG.
COMPUTE IPIP_CO = 10*IPIP_CO.
COMPUTE IPIP_NE = 10*IPIP_NE.
COMPUTE IPIP_OP = 10*IPIP_OP.
* Format without the decimals .
FORMATS IPIP_EX TO IPIP_OP (F5).
* And this is what you have .
LIST USERID IPIP_EX TO IPIP_OP.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:51 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP
781 37 38 40 23 36 914 38 38 40 34 46 1238 36 46 36 34 43 1254 34 46 33 43 39
Number of cases read: 4 Number of cases listed: 4
* Here's the mean of the non-missing values of the subscales .
* (but I don't think you'll have any missing subscale values) .
COMPUTE OVRL_MN = MEAN(IPIP_EX TO IPIP_OP). FORMATS OVRL_MN (F6.2).
* And this is what you get .
LIST USERID IPIP_EX TO IPIP_OP OVRL_MN.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:52 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP OVRL_MN
781 37 38 40 23 36 34.87 914 38 38 40 34 46 39.11 1238 36 46 36 34 43 39.09 1254 34 46 33 43 39 39.11
Number of cases read: 4 Number of cases listed: 4
And here we're BOTH starting to get frustrated. She doesn't think I'm paying attention to her problem; I don't think she's read what I sent her (as, indeed, she hasn't):
Hi Richard,
Sorry if I confused you. I understand everything in your e-mail and syntax, and I think you have misunderstood what I'm asking. Right, lets try to put things more simply.
Say I have a case with a missing value (missing values are coded -99 in my data file. What I want to do is insert the person mean in place of the -99. In the example below I want to insert the person mean for Question 9 (Sorry if I confused you by saying "person mean at the item level". I meant that I would like to insert the person mean for say item/question 9, as opposed to computing a scale score using the person mean). So...in the example below you'll see that I have a certain set of Qs which make up this particular scale, which i've called the physical aggression subscale. So I want some code to insert a person mean in place of the -99. That is, I want to work out the mean score for that particular case on that subscale. If we work this out manually it would be 3 + 5 + 4 + 2 / the number of complete items which is 4 = 3.5. So for case 15 below, a person mean of 3.5 would be inserted in Q9. Once I have inserted all of my person mean scores this way, I can then compute the actual scale scores as normal e.g., Q1 + Q6 etc etc...
Physical aggression subscale Case Q1 Q6 Q9 Q11 Q14 15 3 5 -99 4 2
Does this make things any clearer? Please tell me if not.
Thanks
And here am I, doing my best, but a little testy. This, once again, solves the problem as she posed it. It is, in fact, exactly the logic of my original response to her query, though 'decorated' with a lot more comments and the like.
From: Richard Ristow <wrristow@mindspring.com>
At 02:02 PM 11/17/2005, you wrote
>Sorry if I confused you. I understand everything in your e-mail and
>syntax, and I think you have misunderstood what I'm asking.
Yeah, I was more than a little afraid of that.
>Say I have a case with a missing value (missing values are coded -99
>in my data file.
So far, so good.
>What I want to do is insert the person mean in place of the -99. In
>the example below I want to insert the person mean for Question 9
>(Sorry if I confused you by saying "person mean at the item level". I
>meant that I would like to insert the person mean for say
>item/question 9, as opposed to computing a scale score using the
>person mean). So...in the example below you'll see that I have a
>certain set of Qs which make up this particular scale, which i've
>called the physical aggression subscale. So I want some code to insert
>a person mean in place of the -99. That is, I want to work out the
>mean score for that particular case on that subscale. If we work this
>out manually it would be 3 + 5 + 4 + 2 / the number of complete items
>which is 4 = 3.5. So for case 15 below, a person mean of 3.5 would be
>inserted in Q9. Once I have inserted all of my person mean scores this
>way, I can then compute the actual scale scores as normal e.g., Q1 +
>Q6 etc etc..
That is, as you have it, for Case Q1 Q6 Q9 Q11 Q14 15 3 5 -99 4 2 you want 15 3 5 3.5 4 2
>Does this make things any clearer? Please tell me if not.
It makes clearer what you want to do. The reasoning behind it still worries me.
A. In itself, it isn't very difficult; see below
B. HOWEVER, it's dangerous. You then have a 'value' for an item that's calculated, not input, and with no indication that's the case. Of course, a non-integer value is an indication; still, it's dangerous. At the least, you must NEVER overwrite original files; you must ALWAYS be able to get the original data, in exactly its original form. Even so, there are serious risks.
C. You write, "Once I have inserted all of my person mean scores this way, I can then compute the actual scale scores as normal." True. But you want is to compute the subscale scores that way, you needn't insert the person mean scores. Really. As I keep telling you, the code I've given you, though it does not substitute the person means, does calculate the subscales as though that substitution had been made. So I hope you've got some other reason for wanting the person-means substitutions. AND, whatever reason it is, that you're very, very, very careful. Analyzing individual items where some values are real and others are person-mean of OTHER items, can be pretty dubious. (The general technique is called "missing-value imputation," and it isn't to be rushed into without knowing how it's done and the pitfalls. See MVA in SPSS.)
HOWEVER, since you want it, here goes. This uses the test data you sent before. I am NOT computing the subscale scores. Again, if you wanted the subscale scores, this is not necessary. As to any other use of the mean-substituted values, don't say you weren't warned.
This code is tested. As a bonus, it reloads the original records and interleaves them with the changed ones, so you can see the changes.
Good luck, Richard
* Original
* "I want to use Person Mean Substitution for dealing with .
* my missing data. I have some syntax which creates my scale.
* score by inserting the Person Mean where appropriate. .
* However, I would like some syntax which just inserts the .
* individual Person Mean for all missing data. .
*
* Follow-up Thu, 17 Nov 2005 19:02:03 +0000 .
* She really wants to substitute subscale means for missing .
* values. .
* This loads your file. I'm dropping the subscale scores .
* that you had. .
GET FILE "C:\A_WRR\E-mail\WRR_Desk\Attachments\test PMS.sav" /DROP = ipip_ex TO ipip_op.
* These are some odds and ends to make the displayed file .
* more compact and readable. .
VARIABLE WIDTH q1ipip TO q50ipip (5) ipip_ex TO ipip_op (6) var00001 (7) .
FORMATS q1ipip TO q50ipip ipip_ex TO ipip_op var00001 (F5.2).
VARIABLE WIDTH userid (6) .
FORMATS USERID (F6).
* Mark the changed copy of the file .
STRING STATUS(A4).
COMPUTE STATUS = 'Subs'.
* And person means, here we come ...
* First, calculate the person means for the subscales.
COMPUTE PSNMN_EX = MEAN(q1ipip, q6ipip, q11ipip, q16ipip, q21ipip, q26ipip, q31ipip, q36ipip, q41ipip, q46ipip).
COMPUTE PSNMN_AG = MEAN(q2ipip , q7ipip , q12ipip, q17ipip, q22ipip, q27ipip, q32ipip, q37ipip, q42ipip, q47ipip).
COMPUTE PSNMN_CO = MEAN(q3ipip , q8ipip , q13ipip, q18ipip, q23ipip, q28ipip, q33ipip, q38ipip, q43ipip, q48ipip).
COMPUTE PSNMN_NE = MEAN(q4ipip , q9ipip , q14ipip, q19ipip, q24ipip, q29ipip, q34ipip, q39ipip, q44ipip, q49ipip).
COMPUTE PSNMN_OP = MEAN(q5ipip , q10ipip, q15ipip, q20ipip, q25ipip, q30ipip, q35ipip, q40ipip, q45ipip, q50ipip).
FORMATS PSNMN_EX TO PSNMN_OP (F5.2).
* Second, substitute.
* PSNMN_EX.
DO REPEAT ITEM = q1ipip q6ipip q11ipip q16ipip q21ipip q26ipip q31ipip q36ipip q41ipip q46ipip.
IF MISSING(ITEM) ITEM = PSNMN_EX.
END REPEAT.
* PSNMN_AG.
DO REPEAT ITEM = q2ipip q7ipip q12ipip q17ipip q22ipip q27ipip q32ipip q37ipip q42ipip q47ipip.
IF MISSING(ITEM) ITEM = PSNMN_AG.
END REPEAT.
* PSNMN_CO.
DO REPEAT ITEM = q3ipip q8ipip q13ipip q18ipip q23ipip q28ipip q33ipip q38ipip q43ipip q48ipip.
. IF MISSING(ITEM) ITEM = PSNMN_CO. END REPEAT.
* PSNMN_NE.
DO REPEAT ITEM = q4ipip q9ipip q14ipip q19ipip q24ipip q29ipip q34ipip q39ipip q44ipip q49ipip.
. IF MISSING(ITEM) ITEM = PSNMN_NE. END REPEAT.
* PSNMN_OP.
DO REPEAT ITEM = q5ipip q10ipip q15ipip q20ipip q25ipip q30ipip q35ipip q40ipip q45ipip q50ipip.
. IF MISSING(ITEM) ITEM = PSNMN_OP.
END REPEAT.
EXECUTE
/* So values display in the Data Editor */.
* Now, reload the file with old values, to compare.
ADD FILES /FILE = "C:\A_WRR\E-mail\WRR_Desk\Attachments\test PMS.sav"
/FILE = * /BY USERID /DROP = ipip_ex TO ipip_op
/KEEP = STATUS USERID PSNMN_EX TO PSNMN_OP ALL.
RECODE STATUS (' ' = 'Orig'). EXECUTE.
9. On philosophically OK ground
OK. She's on philosophically OK ground, or at least she's thought out what she's doing. However, that she's still not aware that I've solved her problem. (Twice, now, in fact.)
Hi Richard,
Thanks again for your e-mail. I'm glad things were clearer for you this time. Regarding the rationale for using PMS, I have just spent 2 months researching the missing value imputation literature. I've also been ploughing my way through SPSS's MVA for the past 2 months. So I am very up-to-speed with the missing value imputatio literature, just not the syntax. I do therefore have a research-based theoretical justification for all the analyses I am using, as I will of course need to provide strong rationales in my PhD thesis. Thank you anyway for steering me in the right direction.
You said "Again, if you wanted the subscale scores, this is not necessary. As to any other use of the mean-substituted values..." Yes, I understand this. I have already produced syntax to compute scale scores as though the person mean has been substituted.
My main reason for wanting individual person means inserted is because all but one of my scales don't use means as overall scale scores[emphasis added]. For example, the sample IPIP data I gave you simply sums the items to get the subscale score, as opposed to summing then averaging the items to produce a mean scale score. I wasn't sure whether it would be appropriate to compute person mean scale scores for ALL cases (even those without missing data), because ir wouldn't make sense to have summed scale scores for cases without missing data, and person mean scale scores for those with missing data. I thought it most appropriate to insert the individual person means and then compute the scale scores as normal. Perhaps I am wrong in my approach?? Perhaps using all person mean computed scale scores is OK?
There may also be some data screening analyses I would like to run on item-level data with the person means substituted. I do realise that I need to be careful when analysing individual imputed items.
I've had a look at the code you sent and it looks straightforward enough. I'm sure that I won't have any problems seeing as it's tried and tested. I'll be giving it ago at some point this evening or tomorrow.
I want to thank you again for your time and help Richard.
And, flat-out impatient. Pointing out that she had not read the solutions I'd sent her, which solved her problem from two perspectives: A. Computer summed scales as if person means had been substituted for missings, without doing the actual substitution B. Substituting person scale means for missing values.
I express the impatience by reminding her that I'm sending her what I'd sent her before. Good grief.
From: Richard Ristow <wrristow@mindspring.com>
At 05:45 AM 11/18/2005, you wrote:
>Thanks again for your e-mail. I'm glad things were clearer for you
>this time. Regarding the rationale for using PMS, I have just spent 2
>months researching the missing value imputation literature. I've also
>been ploughing my way through SPSS's MVA for the past 2 months. So I
>am very up-to-speed with the missing value imputatio literature, just
>not the syntax. I do therefore have a research-based theoretical
>justification for all the analyses I am using, as I will of course
>need to provide strong rationales in my PhD thesis. Thank you anyway
>for steering me in the right direction
Good show, then, and good luck.
>My main reason for wanting individual person means inserted is because
>all but one of my scales don't use means as overall scale scores. For
>example, the sample IPIP data I gave you simply sums the items to get
>the subscale score, as opposed to summing then averaging the items to
>produce a mean scale score. I wasn't sure whether it would be
>appropriate to compute person mean scale scores for ALL cases (even
>those without missing data), because ir wouldn't make sense to have
>summed scale scores for cases without missing data, and person mean
>scale scores for those with missing data. I thought it most
>appropriate to insert the individual person means and then compute the
>scale scores as normal. Perhaps I am wrong in my approach?? Perhaps
>using all person mean computed scale scores is OK?
OK. You heard me get impatient with you; this is where. No, "It wouldn't make sense to have summed scale scores for cases without missing data, and person mean scale scores for those with missing data," not by a long, long way. But, one more time: it is possible, not very difficult, to compute sum scale scores with the person means inserted for missing values, without inserting the person means for those values. I got impatient because I'd sent you code to do that, and you kept saying you had to insert person means to do it. See the last code in my note of Thu, 17 Nov 2005 11:36:35 -0500; I'm appending it again at the end of this note.
>Perhaps I am wrong in my approach?? Perhaps using all person mean
>computed scale scores is OK?
I can think of two things you're likely to be doing. One is using the scale scores as dependent variables for ANOVA, ANCOVA or regression; or as independents for regression, or covariates for ANCOVA. For those, the mean score is as good as the sum score; they're related by a constant factor, which doesn't matter for those methods. Except that you have to be careful in interpreting effect sizes: they're divided by the number of items of the mean scale is a dependent variable, and multiplied by the same factor if it's an independent variable or covariate.
The other thing you're likely to be doing is putting respondents into categories using tables given with your scale instruments. Those are always in terms of summed scores; if you used average scores, it would raise Cain with your trying to do it, because you'd have to hand-calculate back to totals
Anyhow, if you want summed scores with person means inserted for missing values, you can have them.
One warning: if you are classifying respondents from their scale scores with RECODE or some such, some subjects may be misclassified if the score should be an integer, but isn't calculated as an integer precisely. That's a limitation of machine arithmetic, and applies whether you insert the computed means and then add, or proceed as I am suggesting. It can cause misclassification at a boundary: if >=10 is in class B, the calculated sum is below 10 by a very tiny fraction, that person will be put in the next lower class. That isn't you or your methods; it's how machine arithmetic works.
Probably (I'm not testing) the following will guard against this:
COMPUTE IPIP_EX = 10*IPIP_EX.
COMPUTE IPIP_AG = 10*IPIP_AG.
COMPUTE IPIP_CO = 10*IPIP_CO.
COMPUTE IPIP_NE = 10*IPIP_NE.
COMPUTE IPIP_OP = 10*IPIP_OP.
DO REPEAT SCALE = IPIP_EX TO IPIP_OP.
. IF (ABS(SCALE - RND(SCALE) < .001) SCALE = RND(SCALE).
END REPEAT.
* If you need inserted person means for reason other than computing sum scales, you do need to do insert them.
* If you want to compute the subscales as if the mean of the .
* non-missing items in that subscale is substituted for the .
* missing items, here's how: * First, calculate the subscales as the MEAN (not sum) that you .
* get if you ignore missing values. That's the same as you get .
* if you substitute the mean of the non-missing values for the .
* missing values .
COMPUTE IPIP_EX = MEAN(q1ipip, q6ipip, q11ipip, q16ipip, q21ipip, q26ipip, q31ipip, q36ipip, q41ipip, q46ipip).
COMPUTE IPIP_AG = MEAN(q2ipip , q7ipip , q12ipip, q17ipip, q22ipip, q27ipip, q32ipip, q37ipip, q42ipip, q47ipip).
COMPUTE IPIP_CO = MEAN(q3ipip , q8ipip , q13ipip, q18ipip, q23ipip, q28ipip, q33ipip, q38ipip, q43ipip, q48ipip).
COMPUTE IPIP_NE = MEAN(q4ipip , q9ipip , q14ipip, q19ipip, q24ipip, q29ipip, q34ipip, q39ipip, q44ipip, q49ipip).
COMPUTE IPIP_OP = MEAN(q5ipip , q10ipip, q15ipip, q20ipip, q25ipip, q30ipip, q35ipip, q40ipip, q45ipip, q50ipip).
* Format so you can see the decimals .
FORMATS IPIP_EX TO IPIP_OP (F5.2).
* And this is what you have .
LIST USERID IPIP_EX TO IPIP_OP.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:50 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP
781 3.70 3.80 4.00 2.33 3.60 914 3.80 3.80 4.00 3.40 4.56 1238 3.60 4.60 3.60 3.44 4.30 1254 3.40 4.60 3.33 4.33 3.89
Number of cases read: 4 Number of cases listed: 4
* To change from average to sum, multiply by the number of items.
* in the subscale, which is 10 in each case.
COMPUTE IPIP_EX = 10*IPIP_EX.
COMPUTE IPIP_AG = 10*IPIP_AG.
COMPUTE IPIP_CO = 10*IPIP_CO.
COMPUTE IPIP_NE = 10*IPIP_NE.
COMPUTE IPIP_OP = 10*IPIP_OP.
* Format without the decimals .
FORMATS IPIP_EX TO IPIP_OP (F5).
* And this is what you have .
LIST USERID IPIP_EX TO IPIP_OP.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:51 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP
781 37 38 40 23 36 914 38 38 40 34 46 1238 36 46 36 34 43 1254 34 46 33 43 39
Number of cases read: 4 Number of cases listed: 4
* Here's the mean of the non-missing values of the subscales .
* (but I don't think you'll have any missing subscale values) .
COMPUTE OVRL_MN = MEAN(IPIP_EX TO IPIP_OP).
FORMATS OVRL_MN (F6.2).
* And this is what you get .
LIST USERID IPIP_EX TO IPIP_OP OVRL_MN.
|---------------------------|-----------------------| |Output Created |17 Nov 05 11:29:52 | |---------------------------|-----------------------|
USERID IPIP_EX IPIP_AG IPIP_CO IPIP_NE IPIP_OP OVRL_MN
781 37 38 40 23 36 34.87
914 38 38 40 34 46 39.11
1238 36 46 36 34 43 39.09
1254 34 46 33 43 39 39.11
Number of cases read: 4 Number of cases listed: 4
Nothing much here; a small correction to the previous note:
From: Richard Ristow <wrristow@mindspring.com>
Bug fix: I left out a right parenthesis in the IF statemen:
COMPUTE IPIP_EX = 10*IPIP_EX.
COMPUTE IPIP_AG = 10*IPIP_AG.
COMPUTE IPIP_CO = 10*IPIP_CO.
COMPUTE IPIP_NE = 10*IPIP_NE.
COMPUTE IPIP_OP = 10*IPIP_OP.
DO REPEAT SCALE = IPIP_EX TO IPIP_OP.
. IF (ABS(SCALE - RND(SCALE)) < .001)
SCALE = RND(SCALE).
END REPEAT.
And FINALLY we seem to be getting it. I'm glad she at least recognized that I had sent her the solutions she'd asked for. She's very polite with her thanks, and I appreciate it, but I was (am) still bothered that I'd sent her solutions that started with small effort (but were correct) and progressed to significant effort, and she didn't read them enough to know what she had. Grrrr. (End of venting.)
To: wrristow@mindspring.com
Hi Richard,
Right...I think we've finally sorted this! You said "...it is possible, not very difficult, to compute sum scale scores with the person means inserted for missing values, without inserting the person means for those values. I got impatient because I'd sent you code to do that, and you kept saying you had to insert person means to do it...". I have to admit I mustn't have read your notes properly when I read your code. I thought the code you were sending me computed person mean scale scores, as opposed to summed scale scores. Please accept my apologies. I have re-read one of your e-mails and found the part where you said this. Again, sorry. I do actually already have code which does compute person mean scales scores without actually inserting the person means for the missing values, and it looks pretty similar to the code you were sending me, hence the reason I thought you were sending me code that I already have.
Because I didn't realise that you had sent me code to compute "summed" scale scores based on inserted person means, this is one reason why I still wanted code to actually insert the person means for the missing values into SPSS, so I could then compute the summed scale scores. However, as I have already said, I actually anticipate running some analyses on the individual inserted person means, and so it is in fact highly likely that I will need all of the code you have sent me so far.
I have tested both the code that inserts the individual person means, and the code that computes the summed scale scores without actually inserting the person mean scores, and both work well.
I can't thank you enough for your time and patience Richard. As I have already said, I am new to using syntax, so am grateful for assistance. You have been a tremendous help.
Thanks again. Best wishes.