# Allocation of sample by proportion

This question was posted the Assessment and Surveillance forum area and has 4 replies.

### Anonymous 81

Public Health Nutritionist

Normal user

25 Feb 2013, 07:27

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

25 Feb 2013, 11:36

### Anonymous 2443

Normal user

25 Jan 2014, 23:09

### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

27 Jan 2014, 16:05

```
w = roofs (or doorways) * proportion empty
```

or:
```
w = roofs (or doorways) * proportion empty * mean HH size
```

These populations (labelled "w" for "weights" above) are then used when analysing the data. The PPS method (in effect) applies population weights before sampling (prior weighting). This method applies these weights after sampling (posterior sampling). Given reasonably accurate population data the two methods will give comparable results. The analysis procedure required is not, I think, available in ENA for SMART but can be done in SUDAAN, M-Plus, SAS, R, S-Plus, STATA, SPSS, in a spreadsheet, or by hand. If you need more information about this then post a follow-up request to this thread.
Access may also be an issue. If you are not pressed for time (e.g. an expected "state of emergency" declaration may shut down access) then you should take "contingency clusters". This means that you select a few more clusters than you need and use these to replace clusters than you cannot get to. Be sure to document what happened so users of the data know what is and isn't represented. If time is an issue then you can use a method such as RAM which uses fewer clusters (e.g. m = 16) than SMART, a more labour-intensive within-cluster sampling method, and a different estimation procedure (PROBIT) for estimating GAM / SAM prevalence. Sample sizes are relatively small (e.g. we have been using n = 192 in Sierra Leone and n = 200 in Sudan). If you need more information about RAM then post a follow-up request to this thread.
If your total population is small (i.e. less than about 5000) then you may be able to use a smaller sample size. The conventional approach is to calculate your sample size as you usually do and then multiply this by a "finite population correction" (FPC). Data analysis can be done with the usuall tools but confidence intervals on estimates are adjusted by a second FPC. If you need more information about this then post a follow-up request to this thread.
You will also need to be careful with some indicators if there has (e.g.) been communal violence (a common reason for displacement) as this can lead to considerable clustering (e.g. of death and destruction) which can result in large survey design effects.
I hope this helps. Post a follow-up message to this thread if you need more information or if I have missed the point.
### Mark Myatt

Epidemiologist at Brixton Health

Frequent user

28 Jan 2014, 16:09

```
n = DEFF * [(p * (1 - p)) / (precision / 1.96)^2]
```

where:
```
DEFF = Expected design effect (usually 2.0 unless we know better)
p = Expected proportion (choose 50% unless we know better)
precision = Desired width of the confidence interval
1.96 = Constant for a 95% confidence interval
```

For example, if we want to estimate a proportion of 10% with a 95% CI of +/- 3% with an expected design effect of 1.5 we would need a sample size of:
```
n = 1.5 * [(0.1 * (1 - 0.1)) / (0.03 / 1.96)^2] = 576
```

This calculation assumes a large population (e.g. N > c. 5000). If you have a smaller population then you can apply a finite population correction. This is:
```
new.n = (old.n * population) / (old.n + (population - 1))
```

where "old.n" is the sample size calculated using the first formula given above.
Continuing the example ... if (e.g.) we were sampling from a population of 1000 we would need a sample size of:
```
new.n = (576 * 1000) / (576 + (1000 - 1)) = 366
```

quite a saving!
DATA ANALYSIS WITH A FINITE POPULATION
Most statistical packages and estimation formulae assume a large population and will present confidence intervals that are **not**corrected for the size of the population. The FPC in this case is:

```
FPC = sqrt((population - n) / (population - 1))
```

Continuing the example we have an FPC of:
```
FPC = sqrt((1000 - 366) / (1000 - 1)) = 0.7966
```

we correct the uncorrected confidence limits by scaling it by this factor. If (e.g.) we had:
```
point estimate = 10.67%
Lower confidence limit = 7.71%
Upper confidence limit = 14.21%
```

we would scale the confidence limits as:
```
Corrected LCL = 10.67 - (10.67 - 7.71) * 0.7966 = 8.31%
Corrected UCL = 10.67 + (14.21 - 10.67) * 0.7966 = 13.49%
```

POSTERIOR WEIGHTING
The first step is to calculate weights for each sampled community. The simplest approach is:
```
w = N / sum(N)
```

where:
```
N = population (e.g. number of roofs) in a sampled community
sum(N) = total population in ALL sampled communities
```

We can then calculate a point estimate as:
```
p = sum(w * (c / n))
```

where:
```
p = point estimate
w = weight (calculated as above)
c = number of cases
n = sample size in a sampled community
```

Here are some example coverage data with eight villages sampled (you'd have more):
```
Village Pop w c n c / n w * c / n
------- --- ------------ ---- ---- ---------- ----------------
1 115 115/900=0.13 29 38 29/38=0.76 0.13*0.76=0.0988
2 91 91/900=0.10 18 32 18/32=0.56 0.10*0.56=0.0560
3 121 121/900=0.13 36 43 36/43=0.84 0.13*0.84=0.1092
4 114 114/900=0.13 15 35 15/35=0.43 0.13*0.43=0.0559
5 98 98/900=0.11 14 42 14/42=0.33 0.11*0.44=0.0363
6 104 105/900=0.12 10 37 10/37=0.27 0.12*0.27=0.0324
7 132 132/900=0.15 5 39 5/39=0.13 0.15*0.13=0.0195
8 125 125/900=1.14 23 42 23/42=0.55 0.14*0.55=0.0770
------- --- ------------ ---- --- ----------- ----------------
SUMS 900 1.00 150 308 NA 0.4851
```

The point estimate is 0.4851 (48.51%).
This sort of calculation can be done in a spreadsheet.
The calculation of the 95% confidence interval is also a little involved:
```
p +/- 1.96 * sqrt(((w^2 * (c / n) * (1 - c / n)) / n)
```

Continuing using the data above:
```
Village w^2 c / n 1-(c/n) ((w^2*(c/n)*(1-c/n))/n)
------- ----- ------ ------- -----------------------
1 0.0169 0.76 0.24 0.00008112
2 0.0100 0.56 0.24 0.00007700
3 0.0169 0.84 0.16 0.00005282
4 0.0169 0.43 0.57 0.00011835
5 0.0121 0.33 0.67 0.00006370
6 0.0144 0.27 0.73 0.00007671
7 0.0225 0.13 0.87 0.00006525
8 0.0186 0.55 0.45 0.00011550
------- ----- ------ ------- -----------------------
SUM = 0.00065045
SQRT(SUM) = 0.02550392
----------------------
```

The 95% CI is then:
```
Lower 95% CL = 0.4851 - 1.96 * 0.02550392 = 0.4351 (43.51%)
Upper 95% CL = 0.4851 + 1.96 * 0.02550392 = 0.5351 (53.51%)
```

This sort of calculation can also be done in a spreadsheet.
I hope this is useful.
Someone should check my work.
Did I miss anything?If you have any problem posting a response, please contact the moderator at post@en-net.org.