Language: English Français

# SMART Survey Analysis: GAM prevalence calculated with SD of 1, how to do it in SPSS?

This question was posted the Assessment and Surveillance forum area and has 7 replies.

### Mark Myatt

Consultant Epidemiologist

Frequent user

5 Feb 2020, 10:13

I am unsure why you would want to assume SD = 1 when you observe SD = 1.23. Doing so is likely to underestimate true prevalence.

What you need to use a PROBIT estimator is the mean and the SD that you observe from your data. For your problem you need to have the mean and SD for boy and girls separately. You then use the Normal cumulative density to estimate prevalence as the probability of finding a child with a (e.g.) a WHZ below your case-finding threshold. You would do this for boys and girls separately. You can find how to do this in SPSS here.

It has been a long time since I used SPSS but I recall that you can acces the Normal cumulative density using 'compute'. Something like:

Transform -> Compute variable -> CDF -> CDF.Normal(theshold, mean SD)

This is described here.

If e.g. you have mean WH = -0.48, SE WHZ = 1.08, and a case-defining threshold of WHZ = -2 then then the probability of a child having WHZ < -2 is 0.07965331 which corresponds to a prevalence of about 7.97%. You want to estimate p using the lower (left-hand) tail of the normal CDF. If you wanted to force SD=1 this is where you would do it using something like:

CDF.Normal(-2, -0.48, 1)

If you want confidence intervals then you would need to calculate 95% confidence limits for the means and use these, in place of the mean, with Normal.CDF. You can get 95% confidence limits for the mean using:

Analyse -> Descriptive Statistics -> Explore

is SPSS.

I hope this is of some help.

### Anonymous 38585

FAO

Normal user

5 Feb 2020, 14:02

Dear All,

If am not wrong the SMART Methodology recomends, if the plausibiltiy check of a survey is problematic, like this survey which has a penalty of 29 (over 25), it suggests to use calculated SD of 1, exclusion from reference mean (WHO flags). But you have to mention this in your report.

Thanks

### Mark Myatt

Consultant Epidemiologist

Frequent user

5 Feb 2020, 16:36

I see. Your SD is outside of the range used for the SMART data-quality report. This could be due to your survey sampling from a number of different populations (e.g. different livelihood zones) resulting in yoUr WHZ being a "mixture of Gaussians".

The SD looks problematic but that does not mean that it is legitimate for you to assume SD = 1. Your observed SD is the best information you have for the value of the SD and you should use that. With the example data (above) we have:

mean WHZ = -0.48, SD WHZ = 1.08

The PROBIT estimator gives p = 0.07965331 (7.97%) If we assumed SD WHZ = 1 we would see the estimated prevalence drop to 0.06425549 (6.43%), Narrowing the SD reduces the prevalence estimate. A wider SD increases the prevalence estimate. I would use the wider SD as it is the best information I have about the SD and because any error will lead to a false-positive increase in the IPC classification which is likely more benign than a false-negative error. Note that a wider SD will lead to a wider confidence intervale about the means and a wider confidence interval about the prevalence estimate. Not also that the wider SD may be due to outliers and, if this is the case, the mean may be biased and the PROBIT estimate be way out. Your best option might to to censor outliers using a rule that censores observations more than 3 interquartile ranges above or below the upper or lower quartiles ... other ways of identifying outliers are available directly in SPSS. Once you have removed outliers you should calculate means and SDs again and apply the PROBIT estimator.

I hope this is some use.

Self-employed

Technical expert

5 Feb 2020, 21:28

Dear Tomas:

In your latest post, it appears that you are calculating the prevalence of GAM by calculating the area under a normal curve defined by the mean and standard deviation of WHZ. I do not understand why you would do this. Let's go back to basics: the prevalence of a given condition is the proportion of a defined population which has that condition. Prevalence is measured by assessing individuals, defining each individual as having or not having that condition, then dividing the number of individuals with the condition by the total number of individuals assessed. GAM is defined in individuals as having a WHZ less than -2 and/or bilateral pedal edema. Just apply this definition to each individual to calculate prevalence. Using statistical manipulation to calculate prevalence from an idealized distribution of WHZ may produce inaccurate estimates of prevalence because such manipulation forces normality on a curve which may not be normal.