# WHO EPI two stage cluster survey method Vis-à-vis DHS or MICS approach

This question was posted the Assessment and Surveillance forum area and has 7 replies.

### Anonymous 81

Public Health Nutritionist

Normal user

29 Nov 2009, 06:04

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

7 Dec 2009, 12:07

**Multiple indicators :**It seems to me that the bulk of the indicators mentioned are simple proportions or percentages. What you need to do is make a list of each indicator and, for each indicator, write down the expected level (if you do not have any good idea then use 50%) and the desired level of precision (i.e. the width of the 95% confidence interval). When you have done this you should use a sample size calulator such as

**SampleXS**: http://www.brixtonhealth.com/samplexs.html Or

**GNU sampsize**, which is available online at: http://sampsize.sourceforge.net/iface/index.html And calculate the required sample size for each indicator. The largest sample size that you calculate here is the the smallest sample size that will yield the desired precision for [u]all[/u] of your indicators. It is, unfortunately, not as simple as that. The indicators apply to different units. For example, the nutritional anthropormetry indicator applies to individual chidlren but a sanitation indicator may apply to a household. You have to account for this when you calculate sample sizes. I will give an example with three indicators: GAM : Expected prevalence = 12%, desired precision = +/- 3% EPI : Expected coverage proportion = 70%, desired precision = +/- 10% Safe disposal of faeces (SDF) : Expected proportion safe = 50%, precision = +/- 10% When I use GNU sampsize for these I get 451, 81, and 97 respectively. The problem here is that the indicators apply to different units: GAM : Children aged 6 - 59 months EPI : Children aged 6 - 24 months SDF : Households You have to find some way of "standardising". The easiest way is to work with households and express the sample size in terms of the number of households required. If we assume that we will find 1.25 children aged between 6 and 59 months in a sampled household then we would need to sample 451 / 1.25 = 361 households to find 451 children aged between 6 and 59 months. If we expect to find 0.25 children aged between 6 and 24 months in a household then we would need to sample 81 / 0.25 = 234 households to find 81 chidren ages between 6 and 24 months. And, of course, you need to sample 97 households to find 97 households. So the sample sizes expressed in numbers of households is: GAM : 361 EPI : 234 SDF : 97 Again, it is not as easy as that. For a cluster sampled survey we have a design effect (DEFF) to consider. This will be different for different indicators. It will be particulary high for anything that tends to cluster spatially (either within or between villages) such as infectious diseases or program coverage. You need to make a guess at these and multiply the calculated sample size by the expected DEFF. If we assume that GAM is not very clustered (DEFF = 1.5) and that EPI and SDF are likely to be more clustered (e.g. DEFF = 3 and DEFF = 2 respectively) then our sample sizes are now: GAM : 361 * 1.5 = 541 EPI : 234 * 3 = 702 SDF : 97 * 2 = 194 If you use a cluster sampled apprach then you will need about 30 clusters (do not go much below this). To calculate the within-cluster sample size you shoudl divide the largest sample size by the number of clusters. In our example this will be 702 / 30 = 24 households. It is common practice with sample size calculations to round up the results of calculations. This may seem to be a compicated procedure but should not present problems if you take it step-by-step.

**The "homogeneity assumption" :**The main problem with the desigh you propose is that it provides a single estimate for an indicator. This is OK as long as it makes sense to have a single estimate. As a survey area gets larger the chances of a single estimate being meaningful decreases. For example, imagine that you cover two districts in a single survey. One district has an active EPI program and the other does not. If the true EPI coverage in the first district is 80% and the true EPI coverage in the second district is 30% then your survey might tell you that EPI coverage is about 55%. You would conclude that EPI coverage was poor everywhere but the truth is that in one disctrict it is pretty good while in the other district it is very bad. Also, neither district has an EPI coverage even close to 55%. IMO, such an estimate applies nowhere.

**[u]A survey that produces misleading results is worse than having no survey at all.[/u]**What you will see in this context is a large design effect and a very wide 95% confidence interval. Not very useful. In the situation that you describe I would be very wary of doing a wide-area survey that yielded a single estimate for the wide area. Stratification (as the term is used in MICS) is one approach but to produce useful results (e.g. per-district results) you would really need to do a full sample size survey in each area. Another form of stratification that might be more useful is spatial stratitification. With this the area is divided into a set of small areas and a sample is taken from each. The trick is to make the small-area sample representative and make clever use of the data you collect. Such methods have been used for estimating CMAM program coverage and for the Myanmar Periodic Review (which uses hexagonal / triangular areas and reuses data to effcetively triple the small area saple size for free). With small samples you may need to classify (e.g. EPI < 50%, between 50% and 80%, or > 80%) rather than estimate indicator proportions. In short ... you could do a MICS type survey but you should be aware that it may yield misleading results. You should be aware that not everyone shares my poor opinion of survey designs like the MICS. I suggest that the forum administrator make a direct request to a responsible person in UNICEF for their opinions on this. I hope this helps.

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

7 Dec 2009, 16:17

### Michael Golden

Normal user

7 Dec 2009, 17:17

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

10 Dec 2009, 11:58

**not**20 indicators. Think of an indicator such as GAM. You need to collect cluster number, age, sex, weight, height, oedema, and MUAC. That is seven variables for one indicator. Another rule of thumb (which contradicts the previous one somewhat - that is the nature or rules of thumb) is "one question per survey". In this rule "question" refers to a topic. So a nutritional anthropometry survey addresses both GAM and SAM and an EPI survey addresses vacciantion status for each vaccine in the MoH "basket". I have been involved with some MICS-style surveys with large questionnaires and the results have not been good. Mike's "20 minute" rule is a good one. I agree with Mike ... I would be very wary of attempting such a survey. You might want to do a triage exercise and split your indicators into (1) "must have", (2) "nice to have", and (3) "not useful at this time". Then take all in (1) if there is room ... if not then rank and take the highest ranked. If you have room then rank (2) by importances and take the most highly ranks ones provided that you can obey the "20 variables" or "20 minutes" rules.

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

10 Dec 2009, 17:52

If you have any problem posting a response, please contact the moderator at post@en-net.org.