# Urgent Help!! Please!! Sampling and study design!!!

This question was posted the Assessment and Surveillance forum area and has 6 replies. You can also reply via email – be sure to leave the subject unchanged.

### Anonymous 2525

Normal user

22 May 2014, 15:29

There is a project that aims to determine change in the knowledge-attitude-practice (KAP) among general public after certain intervention and after 12 months period. The design of the project is intervention and non-intervention (two districts).

In real example, a previous survey showed that among general public, 90% had shame associated with a disease. Our target is that by educating, this would reduce by 11% (NOT 11 percent point) i.e. to 80% level from 90%. So, I need to calculate the sample size (this takes into account two proportions, 90% and 80%):-

By assuming 80% power, 5% alpha risk, 219 per district is the required sample population. I assumedly take 30 clusters (the one intervention district has 200 villages i.e. clusters) as it is recommended by the WHO. So, 219/30=7.3 households per cluster (i.e. village). The primary sampling unit are the households and the proposal is to take three elements from each households (members of the family) to individually respond to the questionnaire. So, in total, 219 households and 219*3=657 individuals

Questions:

1. Is it OK to use this assumedly 30 clusters number?

2. How the required number of clusters could be estimated i.e. to take xx (?) clusters from 200 clusters (villages) of this intervention district as we don’t want to reach all 200 villages.

SECOND

Above approach has number of disadvantages: - that the change (i.e. effect size) is an assumption; no source that determine that the change in KAP due to intervention (effect size) should be by 11% and not by 5% or 20% or 30%. Secondly, after the study is done, the “actual” effect size obtained could be <11%; meaning that the calculated sample size was infact lower than it should have been (sample size inversely proportional to effect size) for the “real” effect of the intervention. This may make results biased and less reliable.

An approach that may counter above problems is:-

To conduct two prevalence of KAP surveys individually, before and after the intervention i.e. without assuming and pre-conditioning that our intervention would bring this much (XX %) amount of improvement in the KAP (i.e. 11% improvement, as above). After having two surveys conducted, then we estimate how much change has actually occurred. The sample size is determined by using “one proportion” and formula as below:-

The primary sampling unit will be household, not individuals. Within each household, three individuals will be invited to respond to the questionnaire--1 <18 years age person, 1 adult male, 1 adult female.

The formula shown below gives a maximum possible sample size of 210 households

N=2* 1.962 (p-(1-p))/d2

p=0.5; 1-p=0.5; d=0.1, totaling 192 PLUS 10% i.e. 211

Thus, dividing 211 with the required number of cluster (villages) according to standard, i.e. 30, would mean 7 households per village. From each household (n=211), three respondents (1 <18 years age, 1 adult male, 1 adult female) as mentioned above will be invited (no random selection) to respond to KAP (7 houses*3 from each house, 21 per village). This means a total of 630 respondents (21*30=630) in each district.

This approach has following advantages:-

No particular effect size of the intervention is pre-assumed before doing the survey while the maximum sample size that can be obtained, is taken (refer to formula above).

Possible disadvantages:

The formula used in the second approach doesn’t take into account “statistical power” but does take into account confidence interval. Power is safeguard again false negative and confidence interval is safeguard against false positive. Power is an important aspect that indicates that the probability of detecting an effect that exists in reality.

Questions:

1. Is this 2nd approach correct even if “power” aspect is not taken into account vis-à-vis our objective (before doing the 1st baseline survey) i.e. to determine the change in a parameter due to an intervention? For this objective (i.e. to determine the change in parameter due to an intervention) can be studied by doing two “independent” surveys (i.e. “one proportion” is used i.e. 90% in sample size calculation, formula as above in second approach) before and after the intervention?

N=2* 1.962 (p-(1-p))/d2

p=0.9 (because of 90%); d=0.1

*This formula doesn’t take into account two proportions i.e. before intervention 90% and after intervention 80%

2. Are there any ways by which lack of “power” in the calculation of sample size in the 2nd approach be compensated? Is absence of “power” (which is usually 80%) in the calculation formula of sample size a problem?

Power 80%

Statistical Power: Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – ß.

In plain English, statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

### Mark Myatt

Frequent user

23 May 2014, 14:36

A lot of questions ... lest us start by addressing a few of them and see where that takes us ...

Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:

Estimated sample size for two-sample comparison of percentages Test H: p1 = p2, where p1 is the percentage in population 1 and p2 is the percentage in population 2 Assumptions: alpha = 5% (two-sided) power = 80% p1 = 90% p2 = 80% Estimated sample size: n1 = 199 n2 = 199

So your sample size in each district should be about 398.

You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:

Estimated sample size for two-sample comparison of percentages Test H: p1 = p2, where p1 is the percentage in population 1 and p2 is the percentage in population 2 Assumptions: alpha = 5% (one-sided) power = 80% p1 = 90% p2 = 80% Estimated sample size: n1 = 157 n2 = 157

Giving a sample size of about 314 in each group.

If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.

I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.

A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).

Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.

Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.

WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.

I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).

In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).

I hope this helps.

Please do not hesitate to ask follow-up questions.

### Mark Myatt

Frequent user

25 May 2014, 11:24

Thank you for your kind comments.

**Deign effect :** This can be confusing as we tend to use the term "design effect" (DEFF) for simple surveys and "variance inflation factor" (VIF) for studies and trials (particularly cluster randomised trials). DEFF and VIF are similar things and account of the loss of variance caused by a cluster sampled design. Some statistical procedures are quite robust to loss of sampling variation but most are not. This means that you should almost always increase sample sizes to account for variance loss whenever you use a clustered sample. It also means that you should use appropriate statistical techniques with such data. This is because many statistic tests and all procedures for calculating confidence intervals rely on estimating variance (and this is usually underestimated when you have a cluster sample).

**Single-tailed test :** This is appropriate is you expect or are interested in an effect in a particular direction. You only expect a decrease (improvement) associated with your intervention so a single-tailed test would be appropriate.

**Secular trend :** If you expect your indicator to also go down in the control area then a two-sample test is required. The risk is that the prevalence of stigma drops to 80% in both areas and you see this in the intervention area and ascribe the change to the intervention when it would have happend anyway.

**Bleed :** One bias that you need to look out for is "bleed". this is where the effect of an intervention (particularly and information-based intervention) bleeds over into the non-intervention area. When this happens you will see a smaller (or no) effect as the intervention is present in both areas. This is a particular problem with mass-media campaigns.

**Numbers of clusters** : There are no (AFAIK) formal ways of getting at the exact number of clusters required since variance loss is influenced by the underlying clumpiness of a variable, the number of clusters, the sample design used in clusters, and other things. The general rule is to take as many small clusters as is feasible. I tend to use a map-segment-sample method for sampling within clusters. The methods in the latest SMART material are useful. The main idea is to try not to lose sampling variation so try to sample from all over the cluster.

**Arithmetic :** Yes. 24 * 12 is 288!

**Statistical tests vs. CIs :** Any approach that (e.g.) has a null value (90%) and then estimates a value and states that if the 95% CI does not include the null value then we have made a difference is doing a statistical test. Sample size and power issues are still there as I can have a small sample size with a wide CI that will seldon exclude the null value (lots of fall negatives) or a large sample size that gives lots of false positives even when there is very little effect. The estimation approach is preferred because it gives an effect size (how big a difference) rather than a p-value which reflect sample sise and effect size (in very large samples almost any difference will have p < 0.05)

Are we getting there?

### Mark Myatt

Frequent user

27 May 2014, 07:50

The point I was trying to make is that the two approaches are equivalent to each other. The only difference between them is the mechanics of the testing. With two surveys you will still want to ask whether the two prevalences differ from each other and you will then fall back on a significance test and that is where we started from.

Best, IMO, to decide what is the smallest effect worth detecting and then calculate the sample size sufficient to detect that with acceptable levels of error. The sample size calculation (and the sample size required) for either approach will be the same.