# Representative sample

This question was posted the Assessment and Surveillance forum area and has 2 replies.

### Anonymous 490

SMART Survey Consultant

Normal user

1 Apr 2015, 21:02

### Mark Myatt

Consultamt Epidemiologist

Frequent user

2 Apr 2015, 09:32

```
Prevalence (p) = 12%
Precision (e) = 3%
Design effect (DEFF) = 1.5
n = DEFF * (p * (1 - p)) / (e / 1.96)^2
n = 1.5 * (0.12 * (1 - 0.12)) / (0.03 / 1.96)^2
n = 676
```

You'd want to take this using about 30 clusters of:
```
cluster size = 676/30 = 23 (rounded up)
```

More small clusters are better than few large clusters.
If you want better precision you will need a larger sample size (either as more smaller clusters or a a bigger overall sample size or both). You will almost never need a sample size greater than n = 900 (i.e. the old 30-by-30 design).
The sample here is for individuals. This can be converted to HHs. If HHs are your unit of interest (e.g. you might want to estimate proportions of HHs with an improved latrine) then you need make no change. If (e.g.) you want to know how many HHs to sample to get a given sample of children then you will need to divide the calculated sample by some estimated mean number of eligible children in each HH. If we expect 1.5 eligible children per HH then (with the above sample size):
```
n = 676
kids per HH = 1.5
number of HHs = 676 / 1.5 = 450
cluster size = 450 / 30 = 15 HHs
```

SMART documentation covers sampling issues in useful detail.
REPRESENTATIVE : This is a really BIG question. The answer depends on how you define "representative". There are a number of definitions. I think we can avoid philosophy here. Assuming that you are using a two-stage cluster sample with PPS sampling to select clusters (most SMART surveys use this) then you will need:
(1) A **complete**list of communities from which to sample clusters. If the list is not complete then there will be a selection bias against the communities not listed. Even a good list will usually excluded (e.g.) transhumant groups such as nomadic pastoralists. (2) A reasonably accurate population estimate for

**all**communities. The accuracy does not need to be absolute as weighting only requires that relative sizes are accurate. (3) The skills needed to take the PPS sample (see SMART documentation). (4) The ability to reach the communities selected in (3). (5) Proper application of the within-community sampling procedures. Failures at any of these points will risk your taking a sample that is not representative. SMART documentation covers sampling issues in useful detail. I hope this is of some use.

### Scott Logue

Normal user

2 Apr 2015, 19:39

If you have any problem posting a response, please contact the moderator at post@en-net.org.