# Sample size for a multi-stage clustered sampling design

This question was posted the Assessment and Surveillance forum area and has 3 replies.

### Mwaisaka

Research Lead

Normal user

8 Aug 2023, 07:31

I'm designing a study on menstrual health hygiene among adolescent girls in Kenya. The sampling approach is a multi-stage clustered sampling design. The primary sampling units for the survey will be the administrative wards selected from 16 counties of interest. The wards will be selected systematically with urban/rural stratification in the 16 proposed counties. For each sampled ward, a complete list of public primary and secondary schools will be made from which a random selection of schools to participate in the study will be done. Following schools’ selection, random sampling will be used to select adolescent girls from a sampling frame generated from the class registers. Only girls in grades 4-8 meeting the eligibility criteria and those in secondary schools will be included.

A situational analysis commissioned by the Ministry of Health in Kenya showed that 54% (58% in rural and 53% in urban areas) of Kenyan girls faced challenges accessing menstrual health hygiene products. From this estimates, how do I claculate my sample size seeing that this is a complex survey.

Thanks.

### Bradley A. Woodruff

Self-employed

Technical expert

12 Aug 2023, 01:22

Dear Mwaisaka:

It appears that your major outcome is dichotomous (faces challenges accessing menstrual health hygiene products vs does not face challenges accessing menstrual health hygiene products). The formula for calculating the minimum sample size with a dichotomous outcome to achieve sufficiently narrow confidence intervals is available in any elementary statistics textbook and on many websites. I would include the formula here, but I cannot seem to paste a Microsoft Word formula or an image into this text box, nor can I type the formula with the necessary superscripts and Greek letters. Sorry. Up

Regarding the values to insert into the formula, 95% confidence intervals (vs 90% confidence intervals or 99% confidence intervals) are nearly universal, so the value for Z would be 1.96. Since the Ministry of Health data show a prevalence of facing challenges to be 54%, I would use 50% as the assumed p. This assumption is safest because it will give you the largest minimum sample size

Accounting for complex sampling is relatively easy. You just need to multiply the sample size obtained from the formula by an estimate of the design effect. I have no idea what the design effect for your major outcome would be in the area of Kenya where you are doing the survey, so you will have to do a bit of research to formulate such an estimate. I would recommend not routinely using 2.0 as a design effect because many nutrition and health outcomes have a substantially lower design effect, so using 2.0 will often result in an unnecessarily large sample size.

Regarding your sampling scheme, it appears that you have three-stage sampling: stage I selects wards, stage II selects schools, and stage III selects individual students. You can either select stages I and II with probability proportional to size and select the same number of girls in each school so that the cluster size is constant, or you can select all stages with equal probability and keep the sampling fraction the same in each school so that the number of selected girls differs between schools. Either scheme will result in an equal probability sample, thus avoiding the necessity of using sampling weights.

I hope this helps.

Bradley Woodruff

### Anonymous 22505

Normal user

12 Aug 2023, 11:09

Dear Mwaisaka:

My question is regarding the study subjects. As per your information, your objective is to know what percentage of girls (Geade 4-8) have access to menstrual health hygiene products. i have no idea about the school system in Kenya but do you think grade 4 students are the ideal target for menstrual health hygiene for your study. inclusion of this group might underestimate the prevalence of access of the products.

### Mark Myatt

Frequent user

28 Aug 2023, 13:35

Sorry for the delay in replying.

It is a complex sample! The problem here is, I think, making a reasonable guess about how much sampling variation may be lost in such as sample. This would allow you to calculate an expected "design effect" that could be used in sample size calculations. Do you have any idea of the design effect you'd expect for your outcomes. What did the MoH survey report? The stratified sample of schools within wards might be expected to reduce the design effect a little compared to a PPS cluster sample as would the random selection of students from student registers. You could use this with the design effect reported by the MoH survey.

Woody points out above that you may need to adjust your sample design to get close to an equal probability sample. If you do not do this you will have to work with dampling weights to make your estimates. Than can be complicated.

I agree with Woody that using a design effect of 2.0 as is common might not be appropriate. This might not be very wrong but it may inflate the sample size and the cost of the survey above what is neccessary.

If you have any problem posting a response, please contact the moderator at post@en-net.org.