I tend to use the WHO "biological plausibility" flagging criteria with anthropometric data. These are very similar to the older CDC / NCHS flagging criteria.
I have been asked (for two separate projects) to apply both the WHO and SMART flagging criteria to some very large datasets and look at the consequences of using each of these. I am anxious to get this right.
I have a couple of questions about the flagging criteria used in SMART surveys.
The SMART manual has:
"In the plausibility report, the program will list and query any value that is ± three standard deviations of the survey mean. After one or two clusters have been entered, or if there has been a previous survey, it is useful to enter in the variable view sheet the limits as the mean ± three standard deviations (or 3 z-scores) during data entry. This enables potential errors to be picked up as early as possible during data entry." (p 83)
This suggests that a flag for (e.g.) WHZ would be raised if:
WHZ = mean(WHZ) - 3 * sd(WHZ)
or:
WHZ = mean(WHZ) + 3 * sd(WHZ)
This suggest that the sample SD is the primary flagging criteria. This approach makes a great deal of sense if we assume normality in the distribution of the indicator (that is a separate issue).
Later the SMART manual has:
"Most children with wrongly measured data give values that are within the plausible range. Inclusion of such errors can be surmised from examination of the standard deviation, and other statistical checks on the data. The standard deviation should be between 0.8 and 1.2 z-score units for WFH in all well-conducted surveys (in 80% of surveys the standard deviation is between 0.9 and 1.1 z-score units). The standard deviation increases as the proportion of erroneous results in the dataset increases; this has a very dramatic effect upon the computed prevalence of wasting. For this reason, if a value is more likely to be an error than a real measurement, it should be removed from the analysis. We do this by taking the mean of the WFH data as the fixed point for describing the status of the population we are surveying. Statistically about 2.5 children out of 900 will lie outside the limits of ±3 z-score units of the mean. Less than 0.5 out of 1,000 will lie outside ±3.5 z-score units from the mean. This forms the basis for deciding if a value is more likely to be an error than a real measurement. The software will list children with these extreme values in the plausibility check list." (p 86)
I find this confusing. The estimates of case-numbers at different values quoted in this paragraph are true only if the sample SD (or 'z') = 1. For SD (or 'z') = 1 and mean z-score = -1 we would expect about 2.43 cases to have a z-score below -4 (that is the mean - 3). If the sample SD is 1.2 then we'd expect 11.18 to have a z-score below -4 (that is the mean - 3). I would not characterise 11.18 is "about 2.5" (it is about 4.5 times larger). If we use the sample SD then we would expect 2.43 cases.
It seems to me that the SMART manual is confused (it confuses me!) about the use of 'standard deviation" and "z". Following the numbers and working back ... the rationale of their being "about 2.5 children out of 900" requires the method to use the sample SD.
Later in the SMART manual we have:
"As explained in the section on extreme values, this tells you if there is substantial random error in the measurements. If the standard deviation is high (over 1.2), it is likely that there are a lot of extreme values and values more than ±3 z-scores of the mean. " (p 87)
This can only be the case if the sample SD is not used and the flagging criteria uses SD = 1 so that a flag for (e.g.) WHZ would be raised if:
WHZ = mean(WHZ) - 3
or:
WHZ = mean(WHZ) + 3
I'm not sure that this approach to flagging makes much sense.
The SMART manual is self-contradictory about this matter and I find myself confused.
My questions are:
(1) Do we use:
WHZ = mean(WHZ) +/- 3 * sd(whz)
or:
WHZ = mean(WHZ) + 3
(2) If we are to use:
WHZ = mean(WHZ) + 3
What is the rationale for this when we can easily get the sample(SD)? Why assume a known variance?
The first question should be easy enough to answer. The second might be more difficult.
All help gratefully received.