Language: English Français

# Urgent Help!! Please!! Sampling and study design!!!

This question was posted the Assessment and Surveillance forum area and has 6 replies. You can also reply via email – be sure to leave the subject unchanged.

### Mark Myatt

Consultant Epideomiologist

Frequent user

23 May 2014, 14:36

A lot of questions ... lest us start by addressing a few of them and see where that takes us ...

Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:

```  Estimated sample size for two-sample comparison of percentages

Test H:    p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2

Assumptions:
alpha =          5% (two-sided)
power =         80%
p1 =         90%
p2 =         80%

Estimated sample size:
n1 =        199
n2 =        199
```

You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:

```  Estimated sample size for two-sample comparison of percentages

Test H:    p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2

Assumptions:
alpha =          5% (one-sided)
power =         80%
p1 =         90%
p2 =         80%

Estimated sample size:
n1 =        157
n2 =        157
```

Giving a sample size of about 314 in each group.

If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.

I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.

A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).

Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.

Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.

WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.

I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).

In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).

I hope this helps.

### Mark Myatt

Consultant Epideomiologist

Frequent user

25 May 2014, 11:24

Deign effect : This can be confusing as we tend to use the term "design effect" (DEFF) for simple surveys and "variance inflation factor" (VIF) for studies and trials (particularly cluster randomised trials). DEFF and VIF are similar things and account of the loss of variance caused by a cluster sampled design. Some statistical procedures are quite robust to loss of sampling variation but most are not. This means that you should almost always increase sample sizes to account for variance loss whenever you use a clustered sample. It also means that you should use appropriate statistical techniques with such data. This is because many statistic tests and all procedures for calculating confidence intervals rely on estimating variance (and this is usually underestimated when you have a cluster sample).

Single-tailed test : This is appropriate is you expect or are interested in an effect in a particular direction. You only expect a decrease (improvement) associated with your intervention so a single-tailed test would be appropriate.

Secular trend : If you expect your indicator to also go down in the control area then a two-sample test is required. The risk is that the prevalence of stigma drops to 80% in both areas and you see this in the intervention area and ascribe the change to the intervention when it would have happend anyway.

Bleed : One bias that you need to look out for is "bleed". this is where the effect of an intervention (particularly and information-based intervention) bleeds over into the non-intervention area. When this happens you will see a smaller (or no) effect as the intervention is present in both areas. This is a particular problem with mass-media campaigns.

Numbers of clusters : There are no (AFAIK) formal ways of getting at the exact number of clusters required since variance loss is influenced by the underlying clumpiness of a variable, the number of clusters, the sample design used in clusters, and other things. The general rule is to take as many small clusters as is feasible. I tend to use a map-segment-sample method for sampling within clusters. The methods in the latest SMART material are useful. The main idea is to try not to lose sampling variation so try to sample from all over the cluster.

Arithmetic : Yes. 24 * 12 is 288!

Statistical tests vs. CIs : Any approach that (e.g.) has a null value (90%) and then estimates a value and states that if the 95% CI does not include the null value then we have made a difference is doing a statistical test. Sample size and power issues are still there as I can have a small sample size with a wide CI that will seldon exclude the null value (lots of fall negatives) or a large sample size that gives lots of false positives even when there is very little effect. The estimation approach is preferred because it gives an effect size (how big a difference) rather than a p-value which reflect sample sise and effect size (in very large samples almost any difference will have p < 0.05)

Are we getting there?

### Mark Myatt

Consultant Epideomiologist

Frequent user

27 May 2014, 07:50

The point I was trying to make is that the two approaches are equivalent to each other. The only difference between them is the mechanics of the testing. With two surveys you will still want to ask whether the two prevalences differ from each other and you will then fall back on a significance test and that is where we started from.

Best, IMO, to decide what is the smallest effect worth detecting and then calculate the sample size sufficient to detect that with acceptable levels of error. The sample size calculation (and the sample size required) for either approach will be the same.