# Combining of multiple an independent surveys through weighting

This question was posted the Assessment and Surveillance forum area and has 2 replies. You can also reply via email – be sure to leave the subject unchanged.

### Anonymous 81

Public Health Nutritionist

Normal user

21 Dec 2014, 13:43

is there simple guideline or tool that guides how to combine multiple an independent surveys into one survey that represent the wide geographic area? in a given province, five surveys (each 30by30) were conducted. During the analysis, at provincial level, we want to combine them by weighting in view of the district population size differences.

### Mark Myatt

Consultant Epideomiologist

Frequent user

22 Dec 2014, 17:09

You have to be careful doing this as you may end up hiding variation behind a rather meaningless average. It is almost always more useful to present per district results if you can (a map is best as there may be some clear spatial pattern that will not be so clear in a table) rather than a single wider-area average. I think you could do both (i.e. present per-district estimates and then per-proving summary estimate.

The first thing to do it to check that it make much sense to combine all the surveys in order to give a single result. This is only the case when the estimates from **each survey** are similar to each other. This can be as simple as a visual check using a "forest plot" of estimates and 95% CIs.

Here is an example of **similarity**:

Survey 1 |-----*-----------| Survey 2 |-----*----------| Survey 3 |------*------------| ... Survey N |-----*-----------| +--+--+--+--+--+--+--+--+--+--+--+--+--+ 8 9 10 11 12 13 14 15 16 17 18 19 20 12 Prevalence (%)

Note that the point estimates (marked by the "*") are close to each other and there is a lot of overlap of the 95% CIs. In this case an average will have meaning.

Here is an example of **dissimilarity**:

Survey 1 |-----*--------| Survey 2 |-----*----------| Survey 3 |------*---------| ... Survey N |---*------| +--+--+--+--+--+--+--+--+--+--+--+--+--+ 8 9 10 11 12 13 14 15 16 17 18 19 20 12 Prevalence (%)

Note that the point estimates are widely spread and some of the CIs do not overlap much or at all. In this case an average will hide variation and would best be avoided.

The pooled proportion is a population weighted average of the proportions found by each survey:

p1 * w1 + p2 * w2 + ... + pn * wn Pooled proportion = --------------------------------- w1 + w2 + wn

where:

p1 = proportion from survey 1 p2 = proportion from survey 2 . . and so-on . w1 = population in area for survey 1 w2 = population in area for survey 3 . . and so-on .

Complications arise when trying to pool variances. This is because the survey samples are complex and the variance is influenced by the proportion, the sample size, and the survey design effect.

One way to approach this problem is to calculate the standard error (SE) from the estimates and 95% CIs reported from each survey:

Upper Confidence Limit - Lower Confidence Limit SE = ----------------------------------------------- 2 * 1.96

The pooled SE is:

( SE1^2 * w1 + SE2^2 * w2 + ... SEn^2 * wn ) Pooled SE = sqrt( ----------------------------------------- ) ( w1 + w2 + wn )

where:

SE1 = SE for survey 1 SE2 = SE for survey 2 . . and so-on . w1 = population in area for survey 1 w2 = population in area for survey 3 . . and so-on .

The pooled estimate is:

Pooled estimate = Pooled proportion +/- 1.96 * Pooled SE

Here is an example with three surveys only ... the survey results are:

Survey Population p LCL UCL -------- ---------- ----- ----- ----- Survey 1 23,670 12.7% 9.7% 16.1% Survey 2 16,546 9.3% 6.3% 13.2% Survey 3 19,201 13.5% 9.8% 18.0% -------- ---------- ----- ----- -----

The pooled proportion is:

Survey w p p * w -------- ------ ----- ----- Survey 1 23,670 0.127 3,006 Survey 2 16,546 0.099 1,638 Survey 3 19,201 0.135 2,592 -------- ------ ----- ----- Sum 59,417 7,236 Pooled proportion = 7236 / 59417 = 0.122

The pooled SE is:

Survey w LCL UCL SE SE^2 SE^2 * w -------- ------ ----- ----- ----- -------- --------- Survey 1 23,670 0.097 0.161 0.016 0.000256 6.059520 Survey 2 16,546 0.063 0.132 0.018 0.000324 5.360904 Survey 3 19,201 0.098 0.180 0.021 0.000441 8.467641 -------- ------ ----- ----- ----- -------- --------- Sum 59,417 19.888070 Pooled SE = sqrt(19.88070 / 59417) = 0.0183

The pooled estimate is:

Point estimate = 0.122 95% LCL = 0.122 - 1.96 * 0.0183 = 0.086 95% UCL = 0.122 + 1.96 * 0.0183 = 0.158

or 12.2% (95% CI = 8.6% - 15.8%).

Important ...

(1) Someone should check my thinking and my arithmetic.

(2) When you do these sorts of calculation you should do them to the full precision throughout and only round at the end. I did not do this above so there will be some accumulated rounding error in the final result above.

I hope this is of some use.