# Calculating variance and DEFF for stratified sampling

This question was posted the Assessment and Surveillance forum area and has 4 replies. You can also reply via email – be sure to leave the subject unchanged.

### Anonymous 730

Nutrition and Food Security Officer

Normal user

28 Mar 2014, 00:40

I would like to request assistance in calculating variance for stratified PPS sampling.

Let's say we do a survey with 3 strata with the following populations and prevalence of malnutrition in a 30 X 7 survey:

Strata Population Sample Prevalence Weight

1 9,870 225 81.3% 0.4997

2 33,599 219 54.8% 1.7475

3 14,130 212 31.1% 0.7590

I calculated the weights from the population and sample for each of the strata, and also calculated the weighted prevalence which is 53.22%.

I would like assistance in calculation of the variance and design effect in this case.

Thanks in advance!

### Mark Myatt

Consultant Epideomiologist

Frequent user

28 Mar 2014, 12:08

I think you want a confidence interval around a point estimate.

Here is a simple approach ...

You data:

Strata N n P ------ ----- --- ----- 1 9870 225 0.813 2 33599 219 0.548 ------ ----- --- ----- 3 14130 212 0.311

Add weights as:

w = N / sum(N)

giving:

Strata N n P w ------ ----- --- ----- ----- 1 9870 225 0.813 0.171 2 33599 219 0.548 0.583 3 14130 212 0.311 0.245 ------ ----- --- ----- -----

The point estimate is:

sum(P * w) = 0.813 * 0.171 + 0.548 * 0.583 + 0.311 * 0.245 = 0.535

The variance is:

sum((w^2 * p * (1 - p)) / n)

From your data:

Strata N n p w x$w^2*x$p*(1-x$p))/x$n ------ ----- --- ----- ----- ----------------------- 1 9870 225 0.813 0.171 1.975795e-05 2 33599 219 0.548 0.583 3.844253e-04 3 14130 212 0.311 0.245 6.067027e-05 ------ ----- --- ----- ----- ----------------------- SUM = 0.00046485 ----------------------- SQRT(SUM) = 0.02156046 -----------------------

The 95% CI is then:

Lower 95% CL = 0.535 - 1.96 * 0.02156046 = 0.4927 (49.27%)

Upper 95% CL = 0.535 + 1.96 * 0.02156046 = 0.5773 (57.73%)

Is this what your need?

BTW : The design effect is the ratio of the variance (calculated above) and the variance calculated from a simple random sample. With your data:

c = round(225 * 0.813 + 219 * 0.548 + 212 * 0.245) = 355 n = 225 + 219 + 212 = 656 p = 355 / 656 = 0.541 var = (p * (1 - p)) / (n - 1) = (0.541 * (1 - 0.541)) / (656 - 1) = 0.000379113 DEFF = 0.00046485 / 0.000379113 = 1.23

Avoid rounding early (as I have done above).

I hope this helps.

You should check my arithmetic.

### Mark Myatt

Consultant Epideomiologist

Frequent user

28 Mar 2014, 14:31

Just to be clear ... rounding errors can accumulate and end up becoming quite large over a series of calculations ... I have been lazy above ... keep all numbers to full precision throughout and round only the final results. If you do calculations in a spreadsheet with raw numbers then you should be OK as the full precision is often retained "behind the scenes".