# SMART surveys and interpretation of quality

This question was posted the Assessment forum area and has 20 replies. You can also reply via email – be sure to leave the subject unchanged.

### Josek

Epidem. Afgan

Normal user

25 Jun 2013, 13:17

Few questions

1. To what extent should the results of ENA plausibility tests be used to reject surveys which collect more than anthropometric data,in the first place was plauasibility meant to be used to reject or accept surveys.i feel these is not clear to national cluster level survey reviewers who are more concerned with the % at the bottom than variable like SD etc

2.Does the proportion of under fives matter when calculating sample size for nutrition surveys

### James lual

Consultant,surveys

Normal user

25 Jun 2013, 15:26

Please visit COD-02 - Conduct a FRAT study in the Democratic Republic of Congo for ToR and application information.

### James lual

Consultant,surveys

Normal user

25 Jun 2013, 19:38

Sample size calculations are very crucial, in deciding how households to be visited per cluster and how many proportion would your resources applied to be measures......as well as designing of survey is concerned......Best James Lual Garang

### Otieno K Musumba

M & E , IMC Kenya

Normal user

25 Jun 2013, 21:39

1. First, ENA plausibility is important as a data quality check for survey teams during data collection; so as to note any weaknesses in advance and take appropriate measures to avert possible impact on the overall data quality.

Secondly,it does determine the overall data quality and further defines levels, as excellent, good, acceptable or problematic; the higher the score the more you should be concerned the data quality. Of course based on this you can reject anthropometric survey if the data is of poor quality.

However, I am of the opinion that Plausibility report should not be used to reject any other data collected alongside anthropometrics. Anyone to refute this?

2. The proportion of U5s in a population (%) is very vital in the planning phase to determine your sample size (number of children to be covered) and more important the number of households to be visited during the survey. Without this, the number of households to be included in the survey cannot be generated.

Hope this is of help.

### Hamid Hussien

Nutrition Specialist Concern WW

Normal user

26 Jun 2013, 09:16

main of ENA SMART advantage daily check for data collection quality ,it will give you indicators about Survey Teams performance and review mistakes then you can treat any raising mistakes which will affect on whole Survey quality

if you didn't daily check for data entry and review miss data it will be increased then in the final your survey score will be problematic( more than 14%) then you will never get approve for this survey and you lost money and time

proportion of under five is very important ( Stage 2) within households calculated

### Anonymous 81

Public Health Nutritionist

Normal user

26 Jun 2013, 10:07

So as to determine the number of households to be visited, of course it is good to have information like the percentage of U5 and household size. But what if there is no such information? I think you should not stop doing your survey because of lack such information. Still you can conduct the survey. To calculate the sample size you don’t need percentage of under five populations. Once you determine the sample size and allocation of the clusters, then you can follow the following sampling procedure. Let’s say you decide 20 kids per cluster. Once the team arrives at the particular village/cluster, they will select the first household based on feasible sampling method. Then they have to continue visiting the nearest house until they got 20 kids. Before the arrival of SMART, most agencies have been using such approach and I think still it works.

### Mark Myatt

Consultant Epideomiologist

Frequent user

26 Jun 2013, 10:21

In response to (1) ...

I am not completely convinced by the type of checking pressed upon us by the good people at SMART.

Some of it is useful ... For example :

**Checks for digit preference** are useful : If there is consistent rounding up or rounding down then this is likely to introduce bias. Checks for digit preference can identify if this may be a problem. It is not a definitive test as you would still get digit preference if proper rounding were taking place (and which would introduce little or no bias).

**Enumerator performance tests** are useful : We can see how good each enumerator is in terms of accuracy and reliability. We can use this to pick survey staff or select survey supervisors and to decide if more training or remedial training is needed.

Some of it is (IMO) just plain daft. The normality testing makes a very grand assumption that will almost always be untrue. We risk condemning a survey because an unreasonable assumption does not hold. The result is that we further marginalise the most marginal an at-risk populations which will be responsible for the fat left-hand tail of the distribution (which prompts rejection of the survey). What SMART condemns as bad data is very likely useful and accurate data. See this post.

The worst case is that we have survey staff adding random elements to data so as to "avoid" age-heaping and censoring accurate data to make a survey fit with one or other fatuous assumptions about how data should behave.

In response to (2) ...

In many survey that we do there are a few different sample sizes :

**The number of clusters (m) :** We like this to be as **large** as is practicable.

**The number of children per cluster (n) :** We like this to be **small** as is practicable.

Is it generally true that bigger 'm' and smaller 'n' reduces the design effect associated with a survey.

We usually compromise on the size of 'm' and try to have as few clusters as possible. This keeps costs down. We usually keep this to about m = 30 if we are using proximity sampling to select households with eligible children. Work on this (looking at "30-by-30" survey that preceded SMART) suggests that a useful minimum of m = 25 clusters is probably OK.

We also have an **overall sample size** which is the product of 'm' and 'n.' This is what we usually calculate when doing a sample size calculation and then work back to 'm' and 'n'.

The proportion of children aged 6-59 months can affect both 'm' and 'n'. If this is small and villages are small then there may only be (e.g.) N = 15 eligible children in most villages. This is often the case in pastoralist settings. In these settings it is not reasonable to have n = 25 because we cannot sample n = 25 children from N = 15 children. In this case you might have n = 10, (or n = 11, 12, 13, 14, or 15). Since 'n' is now small we will need to increase 'm' to meet your overall sample size requirement. In this case both 'm' and 'n' are influence by population size.

Note that with a large 'm' and a small 'n' (and particularly when n is close to taking a census from clusters) we will have a small design effect. This can lead to some savings. Here is an illustrative example :

START WITH: guess at prevalence = 10% required precision = 3% overall sample size (simple random sample) = 384 guess at design effect = 2.0 overall sample size (cluster sample) = 384 * 2.0 = 768 number of clusters = 25 within-cluster sample size = 31 BUT : average village population = 90 proportion aged 6 - 59 months = 17% average village population aged 6 - 59 months = 90 * 0.17 = 15 SO : New design effect = 1.25 (we have more small clusters) New sample size = 384 * 1.25 = 480 within-cluster sample size = 15 number of clusters = 480 / 15 = 32

The proportion aged 6 - 59 months can also affect the overall sample size.

Sample size formulae tend to assume that we sample from an infinitely large population. This assumption is reasonable when the sampling fraction (i.e. the ratio of the sample size to the population size) is small. The error (e.g. the width of the 95% CI) is essentially the same regardless of population size as long as the sampling fraction is less than about 5%.

In the example above we have n = 480. If the size of the population aged 6 - 59 months were about:

N = 480 * 20 = 9600

or larger then we need not worry.

If the sampling fraction was greater than about 5% then we would apply a "finite population correction" (FPC) in order to account for the added precision gained by sampling a large proportion of the population. The FPC can be calculated as:

FPC = sqrt((Population - Sample Size) / (Population - 1))

If we assume a population of 4,800 and a sample size of 480 then we have a sampling fraction of 10%. Since this is above about 5% we should calculate an FPC:

FPC = sqrt((4800 - 480) / (4800 - 1)) = 0.95

The required sample size is now:

n = 480 * 0.95 = 456

Continuing with our example ... we might collect this as 30 clusters of 15 (n = 450 is close enough to n = 456 to make no difference) and save ourselves a little work and a little money.

We do not usually apply an FPC in SMART surveys as (1) savings are usually small and (2) the SMART software does not adjust results to account for the sampling fraction.

I hope this is of some use.

### Mark Myatt

Consultant Epideomiologist

Frequent user

26 Jun 2013, 10:36

Oops ... just checking up on myself ... The FPC that I give above is the FPC for adjusting the CI to account for a large sampling fraction.

The FPC for a sample size is :

new.n = (old.n * population) / (old.n + (population - 1))

Continuing with the example we get :

new.n = (480 * 4800) / (480 + (4800 - 1)) = 436

which we might collect as 29 clusters or 15.

Sorry for any confusion.

The moral here (for me) is to check what I write before posting.

I fear my mind is going.

### Bradley A. Woodruff

Self-employed

Technical expert

26 Jun 2013, 16:12

In response to the suggestion to select the first household at random, then proceed to the next closest household until data are collected on the desired number of children. This is the method first recommended by the Expanded Programme on Immunization (EPI) many years ago. This type of proximity quota sampling has certain built-in biases.

Bias #1: Most small villages and relatively rural settlements have greater housing density in the center. Thus, selecting the next closest household invariably moves the survey team toward the center of the village. People who live in the center of a village are usually different from people who live in the periphery. As a result, the sample in each village is biased toward these people. (see Luman ET et al. Comparison of two survey methodologies to assess vaccination coverage. Int J Epidemiol 2007;36:633-641 for a field comparison).

Bias #2: If one starts selecting households from one side of the village (which is commonly done to make data collection more efficient) and stops when achieving the desired number of children, the team may never reach the other side of the village. As a result, households on that side of the village have zero chance of being included in the survey sample. If for some reason households at one end of the village differ from households at the other side of the village, a sampling bias is introduced.

For these reasons, I highly recommend selecting households by systematic or simple random sampling from a list of all eligible households in that primary sampling unit. Calculate the sample size needed accounting for the average number of target individuals in households and predicted household and individual non-response. Then visit each and every selected household regardless of how many children from whom data are collected. This will minimize the sampling biases inherent in the EPI method.

### Anonymous 81

Public Health Nutritionist

Normal user

26 Jun 2013, 16:22

Dear Brad, thanks for detail clarification. when i said select households based "on feasible sampling method", i was referring Modified EPI or Random or Systematic.

### Victoria Sauveplane

Senior Program Manager, Action Against Hunger CA

Normal user

26 Jun 2013, 17:18

To complement the previous post regarding the finite population correction for proportions, please see the equations 3 & 4 on the following link: http://edis.ifas.ufl.edu/pd006

The proportion of under fives does indeed matter when calculating sample size for nutrition surveys. If you overestimate the proportion of under-fives, it will underestimated the number of households you must visit to reach the desired number of children. (For a discussion of why it is recommended to calculate sample size in terms of number of children, please consult the SMART Manual). Please note that in SMART surveys, an FPC is not usually applied as (1) savings are usually small and (2) the SMART software does not adjust results to account for the sampling fraction. The ENA software does adjust with a finite population correction (see button for correction of small population on the Planning Tab of ENA).

With regards to the first part of the initial question of this thread, the results of the ENA Plausibility Tests assess the overall quality of the data collected from your anthropometric survey. The Plausibility Test is a standard tool that ensures comparability between surveys and gives managers an easy tool for evaluating the data. It is important to run this tool when analyzing your survey data.

That said, the overall score is not meant to be used as a definitive tool for validation, but rather a prompt to highlight key issues for concern. A qualified survey manager should read each test and use it to assess the quality of the survey data. The tests are a tool to identify both selection and measurement bias. The tests may also highlight field realities. For example, if your survey identified more younger children and is therefore penalized for age distribution (and has a highly significant p value), the survey manager should evaluate whether there truly are more younger children in the target population or whether there is an issue of selection bias where younger children were more likely to be surveyed than older children in the population. To determine this, survey managers should first discuss with the field realities with survey teams and cross-check with previous surveys in that region. If there is reason to believe the sample may not be representative of the population in terms of age, consider the penalty points. If not, these penalty points can thus be 'ignored' but the age distribution should be discussed in the discussion section of your survey report.

For further questions relating to the SMART methodology, please refer them to the SMART website: www.smartmethodology.org

### James lual

Consultant,surveys

Normal user

26 Jun 2013, 19:08

Dear Brad, I do agree with the comments above, EPI is statistically invalid as it does not give equal chance of the sample units or the subject matter. It is now making sense!

### Juergen Erhardt

Normal user

26 Jun 2013, 20:11

Dear Mark, just two comments to your last post. In ENA for SMART there is an option to make a correction for small sample sizes and a long time ago we changed the word rejection of a survey with problematic to avoid that users of the software are pushed to manipulate data. It’s just an indication that there might be a problem but if a higher than normal SD can be explained by the kind of measured subjects the quality of the data shouldn’t be criticized.

### Mark Myatt

Consultant Epideomiologist

Frequent user

26 Jun 2013, 20:54

Juergen, Thanks for the clarifications. I still hear of rejections and of surveys reports being held up or embargoed by ministries because of minor deviations from normality. One country even boasts that they do not allow the use of surveys that "fail" the SMART tests. I have had quite extensive correspondence with a SMART pusher from CDC who is most emphatic that any deviation from normality can only be the result of error and believes any other point of view to be perverse. This particular person does a lot of SMART training. Perhaps someone should tell the good people at SMART to pipe down on this or, better still, issue a correction because the problem still exists.

### Tariq Khan

Normal user

27 Jun 2013, 20:52

Mr. Mark has presented well justifications in this regard..

1.

I personally believe that the plausibility tests should not be used to reject the surveys. There are preliminary monitoring checks, before the data is analyzed and plausibility report is prepared, based on that you can decide if the survey should be rejected or not.... 1) wrong age calculation, 2) eligible under five child with no anthropometric measurement 3) repeated implausible height records 4) implausible MUAC records

The plausibility report, on the other hand, better if used for a monitoring purpose, ENA Plausibility report gives you an overall score of survey, but this does not depict the real scenario. It sums up the individual score of each parameter (e.g Height, weight and age). Now consider what happens when you measure the weight with a calibrated digital scale, there is no human error at all, but when you see at the digit of preference, there are chances the report would show you higher Weight Digit Preference score and is counted in the total score of the survey..

I suggest its better to use the plausibility tests for monitoring purpose, particularly team and cluster wise.

2.

The proportion of children under five is important to be taken in the sample size calculation, because this is your sampling Universe and without sampling universe sampling drawing is not possible.

### Mark Myatt

Consultant Epideomiologist

Frequent user

28 Jun 2013, 10:57

I was wrong about the ENA software. It does implement a finite population correction.

I think Tariq is right to point to "monitoring". It is good practice to check data as soon as it is collected (team supervisor and practices such as the recorder echoing values), when it arrives for data-entry (data manager, survey supervisor), at data-entry (using legal values, range-checks, double-entry or read-back), batch-checking (legal values, range-checks including adding indices and checking flagged records). The point is to identify problems quickly and fix them quickly rather than wait until the end of the survey. If ENA plausibility checks help here then we should definitely use them.

WRT a whole survey report. There are issues WRT significance testing. If we reject the null at p = 0.05 we risk rejecting 1 in every 20 good surveys just because of sampling variation. This R code:

failureCount <- 0 for(test in 1:100000) { randomHeights <- round(runif(n = 500, min = 650, max = 1100), 0) finalDigits <- substr(randomHeights, nchar(randomHeights), nchar(randomHeights)) table(finalDigits) p <- chisq.test(table(finalDigits))$p.value if(p < 0.05) { failureCount <- failureCount + 1 } } failedProportion = failureCount / 100000 failedProportion

Simulates 100,000 surveys with no digit preference (just random variation). The result is that the null is rejected (as expected) in 5% (actually 4.968% in the simulation that I ran) of the surveys. This means that 1 in 20 good surveys are rejected.

As the sample size increases even small deviations from expected distributions can be statistically significant. This will usually only be a problem with very large sample sizes. The opposite is also true. With small sample sizes we might not detect clear digit preference.

Here are some examples ... clear digit preference for last digit = 0 or 5:

last digit count ---------- ----- 0 10 1 5 2 5 3 5 4 5 5 10 6 5 7 5 8 5 9 5 ---------- ----- 60 ----- chi-square = 6.6667, df = 9, p-value = 0.6718

but we fail to reject the null of no digit preference.

Here is the same pattern but with 10 times the sample size:

last digit count ---------- ----- 0 100 1 50 2 50 3 50 4 50 5 100 6 50 7 50 8 50 9 50 ---------- ----- 600 ----- chi-square = 66.6667, df = 9, p-value = 0.0000

we reject the null of no digit preference.

This makes it very difficult to use simple significance tests for monitoring when (e.g.) a team might bring in data from two clusters per day (i.e. n = 60 or less).

Since plausibility tests can "detect" problems that are not there (about 5% of the time) and can (with small samples)fail to detect real problems we should be careful when we use them. It is probably better to "eyeball" (visually inspect) the data to see if we have (e.g.) too many ".0" or ".5" final digits for height.

I hope this is of some use.

### Andrew Seal

UCL and NIE Regional Training Initiative

Frequent user

28 Jun 2013, 14:52

Just wanted to agree strongly with the contributors who argue in favour of not using the plausibility score for rejecting surveys.

As Mark said, some if the report is extremely useful while some of it is less so. The fact that the SD is a strong contributor to the score and therefore to the chance of surveys being described as problematic, is, in my opinion problematic. Yes, the amount of variance in a sample may be an indicator of random error in the measurements, but also, it may describe the true nature of a heterogeneous distribution.

Just because many/most survey samples are normally distributed does not mean that they inherently *should* be or that those that are not normal or have a larger SD are wrong! This thinking is a classic illustration of the problem, or limitation, of inductive reasoning and why black swan theory is so important. Put another way, statistical theory should help describe biology, not try and define what it is.

So, my plea is for survey coordinators to use the very useful features of the ENA plausibility to help monitor and improve data quality during survey implementation, but never use the plausibility score by itself to reject, or accept, a survey report.

### Bradley A. Woodruff

Self-employed

Technical expert

28 Jun 2013, 14:54

Thank you, Mark, for the excellent demonstration of the limitations of making hard and fast decisions based only on p values. As an old supervisor once advised, "Don't be a p value doctor!" Assessment of the quality of survey data should almost always involve judgment. Using arbitrarily defined quantitative cut-off points to accept or reject anything, whether it be an entire survey or a specific conclusion, is too simplistic and generally a bad idea, which then results in bad decisions.

### Bradley A. Woodruff

Self-employed

Technical expert

28 Jun 2013, 15:02

Sorry for the repeat post. I forgot to mention that quantitative data quality checks in general, and specifically those calculated by ENA, measure only a few of the many potential sources of bias which may interfere with the validity of survey data. In fact, digit preference for length/height measurements, larger standard deviations for z-scores, etc. may nor may not be associated with bias and certainly do not measure the strongest potential sources of bias. Only correct survey design, careful training and supervision, and correct data analysis can minimize the likelihood of the injection of bias into survey results.

### Andrew Seal

UCL and NIE Regional Training Initiative

Frequent user

28 Jun 2013, 15:11

Hi... there are not many studies on adolescent nutrition in my country, and no studies carried out in the location in which I am intending to carry out my research.

Is it advisable to do a cross-sectional study before starting the intervention study to be able to know the gap and topics that will be relevant during the intervention and also to compare the outcome at the end? thank you