Menu ENN Search
Language: English Français

Stratified cluster survey with 5 stratum Vs. 5 independant surveys

This question was posted the Assessment and Surveillance forum area and has 4 replies. You can also reply via email – be sure to leave the subject unchanged.

» Post a reply


Normal user

8 Aug 2012, 19:07

I need to get GAM estimates for a state and 5 counties within the state in South Sudan. I am planning to conduct 5 independent surveys (1 per county) and combine the results (i.e. with weighting) to get an overall GAM estimate for the state.

I am a bit confused with this approach and the stratified sampling method - i.e. conducting one survey using the 5 states as 5 stratums Vs. doing 5 independent surveys and combining the estimates to get an overall estimate. Would both approach mean the same thing, practically?

I would be greatful for any advice.

Thank you in advance.

Mark Myatt

Frequent user

9 Aug 2012, 12:57

The first thing to say is that wide-area averages are only usually appropriate if prevalence is similar in all of the sub-areas. If this is not the case then you may end up with an estimate that doesn't represent any part of the wider-area with a very wide confidence interval. If prevalence is not similar in all of the sub-areas then it is usually more meaningful to report sub-area results separately. If you expect prevalence to be similar in all sub-areas then it will be quicker, cheaper, and simpler to do a single survey.

Anyway ...

I presume that you will be using a survey design similar to SMART in each county. County-specific estimates are easy enough to do with the SMART / ENA software. The difficulty will likely come when combining the data. A simple average of the fives estimates will, unless each county is the same size, be biased towards the smaller counties. The same would happen if you just put the five datasets together (end-to-end) using appropriate cluster identifiers (e.g. 101 for county = 1 and cluster = 1 through 530 for county = 5 and cluster = 30).

You need some way of weighting results. A weighted average of results would work but calculating the 95% CI might be tricky. There is a wide choice of software to help with this (e.g. CSAMPLE in EpiInfo, SUDAAN, SPSS complex samples, STATA "svy" commands, &c.) but these can be difficult to work correctly.

I think that the simplest approach would be a sort of "super-SMART" approach. Plan to take a large sample SMART survey using the population proportional sampling (PPS) method to identify cluster locations from the state (not the county). Select the number of clusters so that each county contributes 25 or more clusters to the overall sample (25 clusters is a safe minimum for SMART type surveys). Use a cluster size to give a reasonable county-level sample (e.g. 25 or 30 children per cluster). More small clusters are better than few large clusters. When you enter data, have a variable that identifies the county and use unique cluster identifiers for all clusters. You can then analyse the overall dataset with SMART / ENA software. You can also split the dataset into county specific datasets and analyse these separately with the SMART / ENA software.

I hope this is of some help.


Normal user

9 Aug 2012, 17:14

Dear Dr. Mark Myatt,

Thank you very much for the explanation, as always.

They are all SMART surveys. I was planning to do 5 separate surveys (1/county) and combine the data sets to calculate the overall estimate for the whole state using the 'weighted analysis of survey' option in ENA (in this way I can also take care of take care of the weighing).

I am a bit unclear about the 'super-SMART' approach. Does it mean that I list all the villages (i.e. clusters) in the entire state and select (e.g.) 125 clusters for the survey for the super-SMART survey? My concern is that I might not necessarily get 25 clusters in one county when using PPS to select clusters. I am also not sure how it would affect the sample size - i.e. I have calculated sample sizes separately for each survey based on parameters specific to each county.

I am always grateful for all your advise in ENN-NET, especially on assessment section. Is there any reference materials/guidelines on stratified cluster sampling you would recommend? All national survey guidelines only talk about cluster sampling. Although some of them mention stratified sampling as an option, they do not provide any information on how to do it.

I have been specifically looking for information on (or rather the difference between) doing separate surveys and combining them using weighing Vs. conducting one survey with different stratum where stratum specific estimates can also be made.

Mark Myatt

Frequent user

10 Aug 2012, 10:21

You could use the 'weighted analysis of survey' option in ENA. Be sure that you calculate the weighing factor correctly since different software use different weighting algorithms. I suggest that you post a new message in these forums asking advise from ENA users (A message title such as 'Weighted analysis in SMART / ENA' should get their attention. You could also ask in the SMART forum. If you do this then please copy the answer here to help other with similar problems.

The 'super-SMART' approach would go as you suggest. It would probably take a bit of trial and error (or some preliminary arithmetic) to be sure to get at least 25 clusters from each county. You will probably end up doing a little more than 125 clusters since you would want more clusters from the more populous counties to avoid a bias towards small counties.

Unless you are sampling from small populations then the sample size requirements for nutritional anthropometry surveys are straightforward. Before SMART we used to do surveys of the same design and just take 30 clusters of 30 children on the grounds that this would almost always be more than enough. I usually go for a precision of about 3% on a 10% prevalence with a design effect of about 1.5. This gives:

    n = DEFF * ((1.96 / precision)^2 * (p * (1 - p)))
    n = 1.5 * ((1.96 / 0.03)^2 * (0.1 * (1 - 0.1)))
    n = 576

which I'd collect as 30 clusters of 20. You could go for 25 clusters of 24. Having more small clusters tends to keep design effects down which increases precision (there are other ways of reducing design effect but this is the simplest). A sample size calculate as above assumes a large population. With a small population you can use a smaller sample size.

The issue here is to meet this sample size requirement in all counties. If you have at least 25 clusters of 24 in each county then you will have a useful sample size in each county and a large sample size for the state.

The typical surveys (in our field) that use stratified cluster designs are the DHS and MICS. These have become very similar designs. MICS documentation is available here.

A classic text on survey design is 'Survey Methods in Social Investigation' by Kalton and Moser. This is an old book but very sound. I find myself coming back to it from time-to-time. It is also not too heavy on the mathematics - it's all there but presented well and without complication.

Thank you for your kind comments.

I hope this is of some use.


Normal user

10 Aug 2012, 19:24

Dear Dr. Mark Myatt,

Thank you very much as always for your simple and clear explanation.


Back to top

» Post a reply