Language: English Français

# Calculating variance and DEFF for stratified sampling

This question was posted the Assessment and Surveillance forum area and has 4 replies.

### Mark Myatt

Consultant Epidemiologist

Frequent user

28 Mar 2014, 12:08

I think you want a confidence interval around a point estimate. Here is a simple approach ... You data: ``` Strata N n P ------ ----- --- ----- 1 9870 225 0.813 2 33599 219 0.548 ------ ----- --- ----- 3 14130 212 0.311 ``` Add weights as: ``` w = N / sum(N) ``` giving: ``` Strata N n P w ------ ----- --- ----- ----- 1 9870 225 0.813 0.171 2 33599 219 0.548 0.583 3 14130 212 0.311 0.245 ------ ----- --- ----- ----- ``` The point estimate is: ``` sum(P * w) = 0.813 * 0.171 + 0.548 * 0.583 + 0.311 * 0.245 = 0.535 ``` The variance is: ``` sum((w^2 * p * (1 - p)) / n) ``` From your data: ``` Strata N n p w x\$w^2*x\$p*(1-x\$p))/x\$n ------ ----- --- ----- ----- ----------------------- 1 9870 225 0.813 0.171 1.975795e-05 2 33599 219 0.548 0.583 3.844253e-04 3 14130 212 0.311 0.245 6.067027e-05 ------ ----- --- ----- ----- ----------------------- SUM = 0.00046485 ----------------------- SQRT(SUM) = 0.02156046 ----------------------- ``` The 95% CI is then: ``` Lower 95% CL = 0.535 - 1.96 * 0.02156046 = 0.4927 (49.27%) Upper 95% CL = 0.535 + 1.96 * 0.02156046 = 0.5773 (57.73%) ``` Is this what your need? BTW : The design effect is the ratio of the variance (calculated above) and the variance calculated from a simple random sample. With your data: ``` c = round(225 * 0.813 + 219 * 0.548 + 212 * 0.245) = 355 n = 225 + 219 + 212 = 656 p = 355 / 656 = 0.541 var = (p * (1 - p)) / (n - 1) = (0.541 * (1 - 0.541)) / (656 - 1) = 0.000379113 DEFF = 0.00046485 / 0.000379113 = 1.23 ``` Avoid rounding early (as I have done above). I hope this helps. You should check my arithmetic.

### Mark Myatt

Consultant Epidemiologist

Frequent user

28 Mar 2014, 14:31

Just to be clear ... rounding errors can accumulate and end up becoming quite large over a series of calculations ... I have been lazy above ... keep all numbers to full precision throughout and round only the final results. If you do calculations in a spreadsheet with raw numbers then you should be OK as the full precision is often retained "behind the scenes".