Dear All

I have some results but not the full data set from two rounds of data collection.
I would like to be able to conduct a significance test to understand and interpret the apparent reduction in stunting.

Proportion of stunted in Round 1. Intervention and Control
Proportion of stunted in Round 2. Intervention and Control.
I have the n figures for all proportions.

Can I do this without the full data set?
There are several online calculators which I have also seen recommended on this forum. Is it reasonable to use one such calculator to conduct a significance test and estimate p values?
And if so can anybody recommend the most appropriate one?

Dear Anonymous 394:

I am a bit unclear on your question, but let me see if I can provide something useful in response. The statistical significance of an apparent difference seen between two samples or between one sample and a constant depends on the variance of the indicator. The variance, in turn, depends on a) the number of units of analysis included in the analysis, b) if the outcome is dichotomous, such as prevalence, the proportion having the outcome, c) if the outcome is continuous, such as height-for-age z-score, the dispersion of the values, and d) the sampling scheme. Because the variance depends heavily on the number of units of analysis, the p value calculated from an incomplete dataset will not reflect the p value calculated from the full dataset because it will have a smaller number of units of analysis. However, perhaps more importantly, what determines which data are included in the incomplete dataset and which data are not? For example, if the survey is 1/2 finished because you've complete the urban areas but not the rural areas, your incomplete dataset provides a biased estimate of the outcome. It would be a serious mistake to base any conclusions on such biased results.

Regarding which program to use to calculate p values, any will do as long as it can account for the characteristics of your sampling scheme. For example, if you did a survey using cluster sampling, the computer program must be able to take this into account in order to generate an accurate p value. Some programs cannot account for cluster sampling and would produce a p value inappropriately small.

Technical Expert

8 years ago

It is possible to assess the difference between prevalence estimates returned by two surveys. A simple approach is to estimate the standard error in each survey:

SE = (UCL - LCL) / (2 * 1.96)

then pool the two SEs:

PooledSE = sqrt(SE1^2 + SE2^2)

and calculate a z-test:

z = abs(prevalence1 - prevalence2) / PooledSE

The p-value is taken from the normal distribution (if abs(z) > 1.96 the p < 0.05). You can also use an estimation approach:

difference = abs(prevalence1 - prevalence2) 95% CI = difference +/- 1.96 * PooledSE

You could use a similar approach with mean HAZ.

I hope this is of some use.

Mark Myatt
Technical Expert

8 years ago

The significance test as a rule of thumb would enable you to determine whether the difference between your two surveys is likely to be true (not due to chance).

Significance tests in most cases assume that the sample is truly random. As for your case (incomplete surveys/data) and as already pointed out by Brad, the sample you are about to test may exaggerate the accuracy of your results.

SMART nutrition surveys can utilize CDC calculators in conducting significance tests. These can be downloaded from the SMART Website along with detailed instructions.

Kennedy Musumba