Menu ENN Search
Language: English Français

Calculating variance and DEFF for stratified sampling

This question was posted the Assessment forum area and has 4 replies. You can also reply via email – be sure to leave the subject unchanged.

» Post a reply

Anonymous 730

Nutrition and Food Security Officer

Normal user

28 Mar 2014, 00:40

I would like to request assistance in calculating variance for stratified PPS sampling.
Let's say we do a survey with 3 strata with the following populations and prevalence of malnutrition in a 30 X 7 survey:

Strata Population Sample Prevalence Weight
1 9,870 225 81.3% 0.4997
2 33,599 219 54.8% 1.7475
3 14,130 212 31.1% 0.7590

I calculated the weights from the population and sample for each of the strata, and also calculated the weighted prevalence which is 53.22%.
I would like assistance in calculation of the variance and design effect in this case.
Thanks in advance!

Mark Myatt

Consultant Epideomiologist

Frequent user

28 Mar 2014, 12:08

I think you want a confidence interval around a point estimate.

Here is a simple approach ...

You data:

  Strata     N    n    P
  ------ ----- --- -----
       1  9870 225 0.813
       2 33599 219 0.548
  ------ ----- --- -----
       3 14130 212 0.311

Add weights as:
  w = N / sum(N)

giving:
  Strata     N   n     P     w
  ------ ----- --- ----- -----
       1  9870 225 0.813 0.171
       2 33599 219 0.548 0.583
       3 14130 212 0.311 0.245
  ------ ----- --- ----- -----

The point estimate is:
  sum(P * w) = 0.813 * 0.171 + 0.548 * 0.583 + 0.311 * 0.245 = 0.535

The variance is:
  sum((w^2 * p * (1 - p)) / n)

From your data:
  Strata     N   n     p     w  x$w^2*x$p*(1-x$p))/x$n
  ------ ----- --- ----- ----- -----------------------
       1  9870 225 0.813 0.171            1.975795e-05
       2 33599 219 0.548 0.583            3.844253e-04
       3 14130 212 0.311 0.245            6.067027e-05
  ------ ----- --- ----- ----- -----------------------
                                      SUM = 0.00046485
                               -----------------------
                                SQRT(SUM) = 0.02156046
                               -----------------------

The 95% CI is then:

Lower 95% CL = 0.535 - 1.96 * 0.02156046 = 0.4927 (49.27%)
Upper 95% CL = 0.535 + 1.96 * 0.02156046 = 0.5773 (57.73%)

Is this what your need?

BTW : The design effect is the ratio of the variance (calculated above) and the variance calculated from a simple random sample. With your data:

c = round(225 * 0.813 + 219 * 0.548 + 212 * 0.245)
  = 355

n = 225 + 219 + 212
  = 656

p = 355 / 656
  = 0.541
  
var = (p * (1 - p)) / (n - 1)
    = (0.541 * (1 - 0.541)) / (656 - 1)
    = 0.000379113

DEFF = 0.00046485 / 0.000379113
     = 1.23

Avoid rounding early (as I have done above).

I hope this helps.

You should check my arithmetic.

Anonymous 730

Nutrition and Food Security Officer

Normal user

28 Mar 2014, 12:44

Many thanks Mark, very very helpful!

Mark Myatt

Consultant Epideomiologist

Frequent user

28 Mar 2014, 14:31

Just to be clear ... rounding errors can accumulate and end up becoming quite large over a series of calculations ... I have been lazy above ... keep all numbers to full precision throughout and round only the final results. If you do calculations in a spreadsheet with raw numbers then you should be OK as the full precision is often retained "behind the scenes".

Anonymous 730

Nutrition and Food Security Officer

Normal user

28 Mar 2014, 14:52

Thank you very much

Back to top

» Post a reply