Menu ENN Search
Language: English Français

Sample size for IYCF assessment

This question was posted the Assessment and Surveillance forum area and has 9 replies. You can also reply via email – be sure to leave the subject unchanged.

» Post a reply


Public Health Nutritionist

Normal user

5 May 2009, 18:20

We are on process to conduct baseline nutrition survey (two stage cluster sampling, 30 by 30). Within this survey, we want to include assessment of IYCF for children 0 to 23 months. However, we are not sure how many infants under six month to be included in the survey, on top of the 900 children 6 to 59 months. The other question is, are the number children 6 to 23 months screened from the nutrition survey enough to assess the prevalence of other IYCF indicators (2007 new modified indicators) such as Timely initiation of BF 0-23 months, Timely complementary feeding 6 to 9.9 months, Introduction of solid/semi-solid or soft foods 6-8.9 months, Continued breastfeeding 12-15.9 months, and dietary diversity 6-23 months?

Mark Myatt

Frequent user

6 May 2009, 10:29

You will have a sample size in the anthropometry survey of slightly more than 900. Let us assume that it will be 900 children aged between 6 and 59 months. We can also assume that the distribution of ages of teh children will be pretty uniform. This means that you will have about:

((23 - 6) / (59 - 6)) * 900 = 289

children aged between 6 ande 23 months in the sample.

Now you have to work out the required sample size for the IYCF indicators. You need (1) a guess of the indicator value, (2) an idea of the required precision, and (3) a guess of the population aged between 6 and 23 months.

If you have no idea for (1) then assume 50%. For (2) you specify the required width of the 95% CI. 50% +/- 10% is OK but (e.g) 5% +/- 10% is meaningless. For (3) take the population of the survey area and multiply by:

((23 - 6) / (59 - 6)) * 0.2 = 0.064

This assumes that 20% (0.2) of the population are aged between 6 and 59 months. This is usually a pretty safe assumption. So, in a population of (e.g.) 87,000 there will be about:

87000 * 0.064 = 5568

childen aged between 6 and 23 months.

Use a sample size calculator that allows for a "finite population correction". You can download SampleXS from:

or use an on-line calculator such as GNU sampsize from:

Just enter your guesses for (1), (2), and (3). Using GNU sampsize with:

Precision := 7.5%
Prevalence := 50% (this is teh indicator value)
Population := 5568
Level : = 95%

I get a required sample size of 166. This assumes a simple random sample. You are using a cluster sample so you need to make a guess at the design effect. You can only know this from previous surveys. If you have no idea of the design effect then use 2. I'd guess that 1.5 should be OK for IYCF indicators. SampleXS does the calculation for you. If you use GNU sampsize then multiply the calculated sample size by the design effect. With the example (above) the required sample size is:

n = 166 * 1.5 = 249

Since 289 >= 249 then (in this example) you will be able to estimate IYCF indicators with the required precision from the 30-by-30 sample.

What if you need a bigger sample size? My preference would be to increase the main survey sample size using extra clusters. It is better to take more clusters than increase cluster size.

Ali Maclaine

Normal user

7 May 2009, 07:34

Hi, Thanks for the answer but I am a bit confused:
(a) You say that 87000 x 0.064 is 5568 children aged between 6 and 23 months, but above you have the 20% of the population between 6 and 59 months. Isn't the 5568 between 6 and 59m so you would have to recalculate for 6 to 23 months?

(b) Also, does the population figure make much difference anyway to the final answer?

(c) You don't cover what we should do for 0-5.9 months as we want to include them as well.

(d) Sorry about the following question but I am confused. We want to get useful data for each IYCF indicator but we have very little data to base our estimates on. There are also 10 IYCF indicators now each looking at slightly different groups e.g. some take children 6-9months, some 6-23 months etc. Is this ok or do you have to deal with each indicator in a different way?
What I basically want is to know is if we do a normal cluster survey (900 children 6-29 months and thinking add 100 aged 0-5.9m as is proportional to the amount in the population and we figure will be ok) will the results that we get mean anything statistically [so we can say X% (CI X - X) receive semi-solid foods] and therefore is it worth doing?
Sorry for my ignorance.

Nina Berry

IFE Consultant

Normal user

7 May 2009, 10:23

Hi Ali
I am going to have a stab at answering your last question because I'd also like to 'practice' my statistical reasoning - and feeling reasonably thick-skinned at the moment.
A cluster survey is a cross-sectional design. Assuming you are looking for a precision of p<0.05, you will need to have some baseline data from which to estimate the proportion of infants <6months old who display the outcome of interest in order to calculate the minimum sample size required to reach statistical significance.
Let's say we want to know what proportion of the population of infants <6months old was exclusively breastfed in the past 24 hours. We have SOWC data estimating that 15% of infants are exclusively breastfed at 4 months. Assuming we require a precision p<0.05, then (according to the whizz bang table in my whizz bang textbook), we need to sample 139 infants.
However, sample size has to be calculated on the basis of the unit of randomisation. So if your using a cluster design, you have to sample 139 clusters of infants rather than 139 infants, I think. That would significantly increase the number of infants required to reach significance.
If you don't have any basis on which to estimate the proportion of the characteristic of interest then you must assume that it occurs at a rate of 50% because that estimate maximises your sample size. (To give you an idea, for the above scenario, an estimated proportion of 50% would require a sample size of 384!)
Now stats is not my strength and I would be looking for a stat consult before committing to any quantitative design - hence I am more than happy to be corrected.
Nina Berry

Mark Myatt

Frequent user

7 May 2009, 10:42

I'll try to deal with each question in turn:

(a) The proportion of children aged between 6 and 59 month is usually taken to be 20% (this is usually a slight overestimate). In the example of a total population of 87000 this would give about:

87000 * 0.2 = 17400

Children aged between 6 and 59 months. Assuming uniformity of ages amongst these children we can expect about:

((23 - 6) / (59 - 6)) * 100 = 32.1%

to be aged between 6 and 23 months. So the proportion of the population aged between 6 and 23 months will be about:

((23 - 6) / (59 - 6)) * 0.2 = 0.064 = 6.4% (this is the same as 32.1% * 20%)

In the example the total population is 87,000 so the population aged between 6 and 23 months is about:

87000 * 0.064 = 5568

(b) Using GNU sampsize and the parameters of in the example but changing the population size:

Population = 1000, n = 146
Population = 2000, n = 158
Population = 3000, n = 162
Population = 4000, n = 164
Population = 5000, n = 166
Population = 7500, n = 167
Population = 10000, n = 168
Population = 15000, n = 169
Population = 20000, n = 170
Population = 50000, n = 171

A population of 50,000 can be considered "infinite". So the answer to your question is that with large populations there is little difference.

It might make a difference for narrower age-bands. For example the 0 - 6 month old kids might make up only 2% of the population. In our example of a population of 87,000 this would be about:

87000 * 0.02 = 1740

And the required sample size would be 156 (that's about a 10% saving).

It also makes a difference with small proportions and high precision. I used to do a lot of work in Ophthalmic epidemiology. In low-vision and blindness surveys we have a small population (those aged over 50) and rare conditions (say 1%). It makes no sense to estimate 1% with a precision of +/- 10%. Better to estimate 1% +/- 0.25%. In a population of 100,000 you might have about 5,000 aged over 50 years. Not applying the finite population correction we would need to sample 5425 people. Oops! That's a sample of 5425 people from a population of 5,000! Applying the finite population correction gives n = 2745. This may seem an extreme example but looking at levels more commonly encountered might be instructive. Let's look as 10% +/- 3%:

Population = 1000, n = 278
Population = 2000, n = 323
Population = 3000, n = 341
Population = 4000, n = 351
. and so on.
Population = 50000, n = 382

(b) Yes. You do want to include them. I'd suggest that rather than have a special sample we can use the same sampling procedure and (1) sample children 0 - 6 months in households without children 6 - 59 months not collecting anthropometry, and (2) sample children 0 - 6 months in households with children 6 - 59 months not collecting anthropometry in the children 0 - 6 months.

(d) This is a problem with multiple indicator surveys. Even the set of indicators that apply to the entire sample they will have different sample size requirements. The common practice is to pick a subset of the most important indicators and select a sample size that will estimate them with useful precision. The problem is worse when you have indicators that apply only to subsets of a survey sample as you can end up with a truly massive sample just to get a sufficient sample size in a single sub-group. There are four solutions that come to mind:

(1) Keep expanding the sample size so that each sub-group in the sample has a sufficiently large sample size. This is not very efficient.

(2) Take top-up or quota samples. I think that this may prove difficult to do in the field if there are many indicators requiring different sub-samples.

(3) Perform a series of smaller but separate surveys.

(4) Use an indicator such as the IYCF index from the 2000 Ethiopian DHS (Arimond and Ruel 2000, Arimond and Ruel 2002) which provides a weighted score (weights depending on age-group) of breastfed, dietary diversity, and meal frequency. Such an index applies to the whole sample (or a large part of the overall sample of a nutritional anthropometry survey). An advantage of this approach is that the index is a score that can be estimated with good precision with a small sample size.

Shifting approach from estimation to classification might also prove useful. If you are able to set standards (i.e. good situation vs. bad situation) then you can classify with good accuracy with small sample sizes. The textbook example is measles vaccination where we know that < 50% is very bad (very unlikely that herd immunity will operate even over small areas) and > 80% is good (herd immunity operates well). To be able to classify vaccine coverage as good or bad using these standards with very low levels of error can be done with a sample size below 30 using LQAS techniques. Small sample classifiers for multiple levels have been developed and, whilst more complicated than LQAS, usually work well with sample sizes <= 50.

This is a long answer and one that I am not sure answers the question. It is a complicated situation. My preference would be to go with (b) and (4) above.

More specifically you ask

"What I basically want is to know is if we do a normal cluster survey (900 children 6-29 months and thinking add 100 aged 0-5.9m as is proportional to the amount in the population and we figure will be ok) will the results that we get mean anything statistically [so we can say X% (CI X - X) receive semi-solid foods] and therefore is it worth doing?"

This is a simple sample size issue. A simple random sample of 96 will estimate a proportion with a 95% CI of <= +/- 10% (e.g. 50% +/- 10%). If you can live with this level of precision then it will do. You could get away with a smaller sample size of you used a classification technique.

I hope this helps.

Mark Myatt

Frequent user

7 May 2009, 10:51

Nina's calculation of n = 139 looks a bit small to me. GNU sampsize gives:


Precision = 5.00 %
Prevalence = 15.00 %
Population size = 50000

95% Confidence Interval specified limits [ 10% -- 20% ]
(these limits equal prevalence plus or minus precision)

Estimated sample size:
n = 196

This is for a simple random sample. With a cluster sample you have a design effect which (gross simplification) can be seen as a number by which you need to multiply your sample size to get the equivalent sample size from a cluster sample. With a design effect or (e.g.) 1.5 that 196 turns into 294.

Nina's use of p < 0.05 is somewhat confusing since this applies to one type of error in a hypothesis test (AKA significance test) but is related to precision.

Nina's n = 384 for 50% +/- 5% is correct.

Nina Berry

IFE Consultant

Normal user

7 May 2009, 11:17

Thanks Mark - you're right. I misread my whizz-bang table. 196 it is. Now to get my head around the design factor.

Mark Myatt

Frequent user

7 May 2009, 11:39

Nina, Take a look at SampleXS (see link in post above). The help file contains some information on sampling, design effects, sample size, cluster selection &c. This was written for the epidemiology unit for the MSc Community Eye Health at UCL (now at LSHTM).

Anonymous 360


Normal user

8 Mar 2010, 09:27

Thank you so much for the survey information. I come across it while researching for KAP survey we plan to do soon covering Essential Nutrition Actions and WATSAN issues. Could I kindly request for sample questionnaires on KAP surveys covering mothers of children aged 0-24 months. Thank you in advance.

Anonymous 3443

Health and nutrition manager

Normal user

27 Jan 2016, 18:11

Guys I have learnt a lot about what I plan to do soon going to implement, I plan to conduct KAP survey by next month and was looking around for who could guide me through, at least I have managed to gather much I hope you will not get tired if I requested for further assistance.
Can I have request for sample questionnaire please.

Back to top

» Post a reply