Menu ENN Search
Language: English Français

Minimum Number of Clusters in a SMART Survey

This question was posted the Assessment forum area and has 6 replies. You can also reply via email – be sure to leave the subject unchanged.

» Post a reply

Anonymous 2352

Normal user

31 May 2014, 14:01

I know the SMART Methodology recommends that the minimum number of clusters in a survey should be 25. However, I have a scenario where the number of clusters sampled were 32 and after collecting data in 24 cluster, the district became volatile and the government had to suspend the survey in order to deal with the situation. Would it be right to make decision on the nutrition situation using the data from the 24 clusters?

Anonymous 730

Nutrition and Food Security Officer

Normal user

1 Jun 2014, 20:13

Its really unfortunate.You may not be able to call your results valid as more than 10% of clusters were not reached. Just out of interest,what percentage of your required sample size for children did you achieve from the 24 clusters compared to the expected number from 32?

Anonymous 2352

Normal user

2 Jun 2014, 14:25

Thanks Blessing...the sampled children from the 24 clusters represents 86% of all children in the initial sample i.e. 344 children have since been sampled while the target sample was 399 children

Mark Myatt

Consultant Epideomiologist

Frequent user

2 Jun 2014, 14:44

The issue is not about a minimum number of clusters. The m = 25 limit is (AFAIK) based on work done when surveys used a proximity sample (i.e. a bunch of neighbouring houses) within each clusters. When we use a within-cluster sampling method that captured more within-cluster variability than a proximity sample then you can use a smaller number of clusters. It will, however, always be better the take many small clusters.

You issue is that the overall sample may no longer be a proper PPS sample. Faced with the need to "save" this survey I would use posterior wasting to get estimates. You will probably have poor precision (i.e. wide confidence intervals) due to a smaller sample size. When reporting results you must report what happened.

I hope this helps.

Anonymous 2352

Normal user

3 Jun 2014, 04:02

Thanks Mark..Would kindly explain the process of getting the posterior estimates. I would be glad to save the survey

Mark Myatt

Consultant Epideomiologist

Frequent user

3 Jun 2014, 04:24

I would do this using a blocking weighted bootstrap but that is not for everyone. You can use EpiInfo (any version), SAS, SUDAAN, STATA, or SPSS to analyses your data. All the packages work in different ways ... you will usually need to specify the cluster identifier variable and provide sampling weights based on cluster populations. I often find these software quite tricky to steer so make sure you read the manual. I think there are videos and case-studies on the CDC site and an YouTube showing how this is done in EpiInfo.

If you find the software confusing then you could do the analysis by hand. See this disucssion on this forum. An extended example is given in the SLEAC section of the SQUEAC manual.

I hope this helps.

Scott Logue

Normal user

3 Jun 2014, 12:45

Assuming that the data collected was high quality it can be considered valid. There are two potential issues at hand; precision and design effect.

Precision: as Mark stated, the confidence intervals will be wider as the sample size (children) was not fully obtained. However, you mentioned in a follow-up response that 86% of the sample size was collected. SMART recommends that if at least 80% of the sample size is obtained the data collected can still be used.

Design Effect (DEFF): The design effect of the survey will be effected depending on the number of households per cluster. If there was a low number of HH per cluster the DEFF will increase; if there was a high number of HH per cluster than the DEFF will decrease.

Since there were supposed to be 32 clusters included in the survey there would have been 4 reserve clusters (RC) randomly selected in ENA (at the same time the other clusters were selected). As a mitigation strategy, it is recommended to now include all of these 4 reserve clusters in the survey if these clusters are located in a safe area. If only 1 or 2 RC are accessible it is highly recommended to include them in your survey as this will still allow better precision overall for the interpretation of your results.

In the report it is very important to outline which specific clusters from the sampling frame were not accessed during data collection due to unforeseen volatile outbreaks, as well as to identify the RC added (the ones that can be accessed).

Back to top

» Post a reply