# Deciding on sampling design: cluster sampling or stratified random sampling

This question was posted the Assessment and Surveillance forum area and has 10 replies.

### Jordana Leitão

National specialist- CHW programme in Angola

Normal user

3 Mar 2016, 21:53

Hello, we are doing a baseline study for the evaluation of a programme. The programme is going to be implemented in 6 provinces and 18 counties. We want to have information by each county. How does this influence our sampling design?

I was thinking…

1º- Randomly select x number of project communities/microareas (all have the same size, around 250 people, and there are 30 by each county) from each of the 18 counties.

2º- Randomly select x number of households from each microarea.

But would this two-stage sampling design give us a sample size sufficient to have accurate indicators at county-level? Or would I need to account the counties as domains/sub-samples and use a stratified random sampling design?

Many thanks in advance for any help you can give us.

Jordana

### Bradley A. Woodruff

Self-employed

Technical expert

5 Mar 2016, 00:10

Dear Jordana:

A complete answer to your seemingly simple question would involve an explanation of statistics and sampling methodology. I would recommend consulting a textbook. However, I will try to address some of the major points below.

First of all, you need to determine what type of random sampling you will do. It seems from your description that the basic sampling unit is household. If there is a list of all the households in your 6 provinces, you may be able to do one-stage simple or systematic random sampling. However, the distance between selected households may be prohibitively large. If this is true, you may need to do 2-stage cluster sampling for purely logistic reasons.

You would then calculate a sample size for each stratum, which in your case is the county. The sample size calculation requires that you make some assumptions about the prevalence of your outcomes, how much precision you need, and what design effect you expect. Then the sample size would be multiplied by 18 to determine the sample size for the entire study. So you can easily see that increasing the number of strata can greatly increase your sample size.

If you do cluster sampling, you want to select as many clusters in each county as is feasible to minimize the size of each cluster. The larger the cluster, the higher is the design effect and the poorer is your precision.

Regarding your question, the sample size has nothing to do with accuracy. Accuracy is whether your point estimate from your survey is near the actual value in the population because of the absence of bias. This is determined by whether the sampling and measurements were done correctly. In contrast, the sample size influences precision which is a reflection of the degree of sampling error. So whether or not your results are useful depend on both a lack of bias (and therefore having accurate results) and an acceptable degree of precision (and therefore being relative certain that the difference between the survey estimate and the true population value is not high because of random sampling error). For a better explanation of these concepts, see http://conflict.lshtm.ac.uk/page_39.htm, http://www.unscn.org/en/resource_portal/index.php?&themes=201&resource=602, or any basic text on surveys and sampling.

Regardless, you should account for the stratification by county during data analysis because, if the outcome is different in different counties, you will get better precision than if you did not account for the stratification. Most larger statistical analysis software programs, such as SAS, SPSS, Stata, and R, will allow accounting for cluster and stratified sampling during analysis.

### Anonymous 81

Public Health Nutritionist

Normal user

5 Mar 2016, 18:10

Dear Jordana,

It seems that your interest is county based information. If this is so, then you need to have 18 independent surveys. I am assuming one survey per county given each county are homogenous. For each county, you need to calculate sample size. the sample size for each county could be different as the size depends on expected prevalence and other parameters.

Regarding the sampling methodology, given the context of your area, I am also assuming two-stage cluster sampling. Stage I - you need to select clusters from list villages in your case communities/microarea and Stage II - select households randomly from the list of households in the village.

if your stratification is based on other factors such as livelihood or agro-ecology or provincial level, you cannot analyze /disaggregate the results by county. for example, if you do six SMART surveys ( one per province), you cannot analysis, by county but you can say about the province. such provincial level analysis might under or over estimate the findings specially if the counties are very heterogeneous.. .

### Kennedy Musumba

SMART Program Manager

Normal user

8 Mar 2016, 06:22

Dear Jordana,

I do agree with Kiross on the above approach, however, it is quite laborious and resource intensive. The approach will be guided by the objectives of your study, in some cases you could put into account the livelihood zones of the counties and probably do your survey at slightly larger area compared to a county, e.g. livelihood zones such as pastorals, mixed farming etc. and proceed with the 2-stage cluster approach.

### Kennedy Musumba

SMART Program Manager

Normal user

8 Mar 2016, 06:22

Dear Jordana,

I do agree with Kiross on the above approach, however, it is quite laborious and resource intensive. The approach will be guided by the objectives of your study, in some cases you could put into account the livelihood zones of the counties and probably do your survey at slightly larger area compared to a county, e.g. livelihood zones such as pastorals, mixed farming etc. and proceed with the 2-stage cluster approach.

### Bradley A. Woodruff

Self-employed

Technical expert

8 Mar 2016, 18:14

Regarding this discussion, the sampling stratification scheme is determined by a) what results are needed to make program decisions, and b) the resources available. If county-specific results are needed, then you need to apply the calculated sample size to each of the 18 counties if resources are available to do this. Such fine stratification is often expensive because it greatly increases the overall required sample size, but if you need to do it, you must do it.

Regarding the suggested stratification by livelihood zone or other criteria, it must be kept in mind that to do stratified sampling, you must be able to define the value for the stratification variable for each and every sampling unit. This means that you would have to be able to determine the livelihood zone for each primary sampling unit, in this case each "microarea". Usually, it is impossible to determine which livelihood zone each primary sampling unit belongs to. Livelihood zone maps supply only general geographic locations; they cannot define the livelihood zone for each primary sampling unit. Moreover, populations move, and in few primary sampling units does every household derive their livelihoods from the same activity or category of activities. In fact, there are very few criteria which can be used to stratify the first stage of cluster sampling. If the primary sampling unit is census enumeration area, census data may have for each such unit socio-economic data, racial or ethnic distribution, linguistic distribution, or other demographic information; however, for other primary sampling units, such as village or subdistrict, similar information is not available. So although it's easy to say stratify by livelihood zone, it's usually impossible to do it correctly.

### Elisa Dominguez

Nutrition officer at WHO

Normal user

9 Mar 2016, 15:07

Dear Jordana,

I was working in Angola on health and nutrition programmes at community with CHW. I used LQAS methodology for coverage assessments. Such methodology allows comparability between different counties and small sampling. I have some slides in Portuguese that I can share with you. Please contact me through my email: elisadmuriel@yahoo.es

### Bradley A. Woodruff

Self-employed

Technical expert

9 Mar 2016, 19:05

Dear Jordana:

First of all, it appears that you are planning a quite complex survey with complex sampling and multiple outcomes. For this reason, I would highly recommend you find someone local or within your organization with experience in statistics, epidemiology, and survey methodology with whom you can discuss all the essential details during survey planning and implementation as they come up before you spend a lot of time and money to collect data which may not be useful due to methodologic problems. There are many pitfalls which must be avoided during planning and implementation,any one of which may threaten the validity of the survey results.

The most important thing when calculating sample size is formulating the appropriate assumptions and deciding on the level of precision you need in your final results. If you calculate a sample size to achieve a certain level of precision which you need to make program decisions, then this sample size must be applied to each stratum for which you need that level of precision. You have calculated a sample size of 315 children based on a desired precision of +/- 5 percentage points around an estimate of 11%. If you want this precision in each county, then this sample size must be selected in each county.

Regarding sampling, it seems that your target group is children less than 6 months of age. According to the Angola 2011 DHS, children less than 5 years of age represent 21.3% of the population. Because of infant and under-5 mortality, children less than 6 months of age probably make up somewhat more than 1/10th of children less than 5 years of age, so let's say they make up 2.5% of the population. The World Bank estimates an average household size of 8.63, so on average, there is only 0.22 children less than 6 months of age per household, meaning that if teams selected households without regard to household members, they would have to visit 5 households to find one child less than 6 months of age. This would mean selecting 1432 households in each county, for a total of 25,773 in all 18 counties. However, if you are interested only in children less than 6 months of age and do not wish to collect data from households with a child of this age, then at 4 out of 5 households, you would only need to ask if there are any eligible children in the household. This would be relatively quick.

Finally, the size of the design effect, and therefore the effect of cluster sampling on your precision, is determined in part by the average size of each cluster. You will achieve a lower design effect and higher precision if you split the sample size of 315 children into more clusters and thereby decrease the size of each cluster.

I hope this helps. Let me reiterate the recommendation to find a knowledgeable person locally or within your organization with whom to discuss details and questions as they come up.

### Bradley A. Woodruff

Self-employed

Technical expert

9 Mar 2016, 21:45

Dear Jordana:

Please contact me at bradleyawoodruff@gmail.com. I would like to provide whatever assistance I can, but the technical details may be of less interest to other en-net participants.

If you have any problem posting a response, please contact the moderator at post@en-net.org.