Menu ENN Search
Language: English Fran├žais

SMART Survey Analysis: GAM prevalence calculated with SD of 1, how to do it in SPSS?

This question was posted the Assessment and Surveillance forum area and has 7 replies.

» Post a reply

Tomas Z

Normal user

4 Feb 2020, 20:56

Dear All,

I am analyzing a SMART survey dataset of a district A where standard deviation is 1.23 (got maximum penalty point). I am considering reporting the calculated GAM prevalence with a SD of 1. Is a possibility of getting the prevalence breakdown by sex and CI? I know to get to that analysis (calculating the prevalence with SD of 1) it uses the probity function. I have the raw dataset, can someone shed light how I can proceed in SPSS so that I can have the calculated prevalence distributed by sex and with CI?

I know plausibility reports provides the calculated prevalence with SD 1, but it doesn’t provide by sex, neither the CI.

Thank you.

Mark Myatt

Epidemiologist at Brixton Health

Frequent user

5 Feb 2020, 10:13

I am unsure why you would want to assume SD = 1 when you observe SD = 1.23. Doing so is likely to underestimate true prevalence.

What you need to use a PROBIT estimator is the mean and the SD that you observe from your data. For your problem you need to have the mean and SD for boy and girls separately. You then use the Normal cumulative density to estimate prevalence as the probability of finding a child with a (e.g.) a WHZ below your case-finding threshold. You would do this for boys and girls separately. You can find how to do this in SPSS here.

It has been a long time since I used SPSS but I recall that you can acces the Normal cumulative density using 'compute'. Something like:

    Transform -> Compute variable -> CDF -> CDF.Normal(theshold, mean SD)

This is described here.

If e.g. you have mean WH = -0.48, SE WHZ = 1.08, and a case-defining threshold of WHZ = -2 then then the probability of a child having WHZ < -2 is 0.07965331 which corresponds to a prevalence of about 7.97%. You want to estimate p using the lower (left-hand) tail of the normal CDF. If you wanted to force SD=1 this is where you would do it using something like:

    CDF.Normal(-2, -0.48, 1)

If you want confidence intervals then you would need to calculate 95% confidence limits for the means and use these, in place of the mean, with Normal.CDF. You can get 95% confidence limits for the mean using:

    Analyse -> Descriptive Statistics -> Explore

is SPSS.

I hope this is of some help.

Tomas Z

Normal user

5 Feb 2020, 12:43

Dear Mark, 

Thanks a lot for this detailed feedback. I would like to react to your first paragraph by sharing the SMART plausibility report you can find here. From there you will see that SD is problematic, assuming the maximum score. That is why I want to use the calculated prevalence with SD of 1. 

Thanks for this useful discussion, 

Regards. Tomás

Anonymous 38585


Normal user

5 Feb 2020, 14:02

Dear All,

If am not wrong the SMART Methodology recomends, if the plausibiltiy check of a survey is problematic, like this survey which has a penalty of 29 (over 25), it suggests to use calculated SD of 1, exclusion from reference mean (WHO flags). But you have to mention this in your report.


Mark Myatt

Epidemiologist at Brixton Health

Frequent user

5 Feb 2020, 16:36

I see. Your SD is outside of the range used for the SMART data-quality report. This could be due to your survey sampling from a number of different populations (e.g. different livelihood zones) resulting in yoUr WHZ being a "mixture of Gaussians".

The SD looks problematic but that does not mean that it is legitimate for you to assume SD = 1. Your observed SD is the best information you have for the value of the SD and you should use that. With the example data (above) we have:

    mean WHZ = -0.48, SD WHZ = 1.08

The PROBIT estimator gives p = 0.07965331 (7.97%) If we assumed SD WHZ = 1 we would see the estimated prevalence drop to 0.06425549 (6.43%), Narrowing the SD reduces the prevalence estimate. A wider SD increases the prevalence estimate. I would use the wider SD as it is the best information I have about the SD and because any error will lead to a false-positive increase in the IPC classification which is likely more benign than a false-negative error. Note that a wider SD will lead to a wider confidence intervale about the means and a wider confidence interval about the prevalence estimate. Not also that the wider SD may be due to outliers and, if this is the case, the mean may be biased and the PROBIT estimate be way out. Your best option might to to censor outliers using a rule that censores observations more than 3 interquartile ranges above or below the upper or lower quartiles ... other ways of identifying outliers are available directly in SPSS. Once you have removed outliers you should calculate means and SDs again and apply the PROBIT estimator.

I hope this is some use.

Tomas Z

Normal user

5 Feb 2020, 20:48

Dear Mark,

It is always a good pleasure to read from you.

I have gone through as per the description provided by you and I got the following:

  • For the overall GAM prevalence, I used the following details from my dataset:
    • GAM by WHZ defined as anything <-2 z-scores
    • My mean was of: -0.30
    • SD = 1.23.

Please note my test for outliers was excellent (as per the plausibility report I shared before).

With this, the CDF.Normal (-2, -0.30, 1.23) = 0.083468 = 8.35%

For prevalence by sex, I calculated the mean z-score for boys and for girls separately, the SD for each and using the same case-definition of GAM by WHZ. I got the following:

  • Boys
    • Mean: -0.30
    • SD: 1.31

CDF.Normal (-2, -0.30, 1.31) = 0.09719 = 9.72%

  • Girls
    • Mean: -0.42
    • SD: 1.29

CDF.Normal (-2, -0.42, 1.29) = 0.1132 = 11.3%

Overall comments:

I must confess that this was indeed of some use, but at the same time a need to confess that method has the limitation of not taking in consideration bilateral edema. Also, acknowledging the fact that the of of calculated prevalence with a SD of 1 underestimates prevalence, perhaps the SMART methodology team should consider adopting this (CDF) approach and, consequently IPC protocols too as it IPC recommends use of calculated prevalence with SD of 1 when SD is beyond the acceptable ranges.

Million of thanks.

Bradley A. Woodruff


Technical expert

5 Feb 2020, 21:28

Dear Tomas:

In your latest post, it appears that you are calculating the prevalence of GAM by calculating the area under a normal curve defined by the mean and standard deviation of WHZ. I do not understand why you would do this. Let's go back to basics: the prevalence of a given condition is the proportion of a defined population which has that condition. Prevalence is measured by assessing individuals, defining each individual as having or not having that condition, then dividing the number of individuals with the condition by the total number of individuals assessed. GAM is defined in individuals as having a WHZ less than -2 and/or bilateral pedal edema. Just apply this definition to each individual to calculate prevalence. Using statistical manipulation to calculate prevalence from an idealized distribution of WHZ may produce inaccurate estimates of prevalence because such manipulation forces normality on a curve which may not be normal. 

Tomas Z

Normal user

6 Feb 2020, 07:19

Dear Bradley,

Thanks for your attention.

Allow me to take you back to my initial post where I stated a problem and in my second post I shared how the problem looks like based on the SMART plausibility report. This problem makes the use of basics calculation of a ‘prevalence’ not realistic as it’s likely to overestimate the GAM prevalence measured by Weight-for-height because SD is way out from the acceptable ranges as per the SMART methodology (0.8 – 1.2) AND, as recommendation from the SMART when this happens (when SD is >1.2) then a calculated prevalence with a SD of 1 should be used and by  doing this way it underestimates the prevalence as it goes all the way down.

I fully agree with you that “Using statistical manipulation to calculate prevalence from an idealized distribution of WHZ may produce inaccurate estimates of prevalence because such manipulation forces normality on a curve which may not be normal.” That’s why Mark was showing here a way how to calculate the prevalence (when SD is beyond the upper limit) but using techniques that do not underestimate the prevalence, by using observed mean, SD from my survey sample.

Thanks again for your feedback about this post.

If you have any problem posting a response, please contact the moderator at

Back to top

» Post a reply