Menu ENN Search
Language: English Français

Urgent Help!! Please!! Sampling and study design!!!

This question was posted the Assessment forum area and has 6 replies. You can also reply via email – be sure to leave the subject unchanged.

» Post a reply

Anonymous 2525

Normal user

22 May 2014, 15:29

There is a project that aims to determine change in the knowledge-attitude-practice (KAP) among general public after certain intervention and after 12 months period. The design of the project is intervention and non-intervention (two districts).

In real example, a previous survey showed that among general public, 90% had shame associated with a disease. Our target is that by educating, this would reduce by 11% (NOT 11 percent point) i.e. to 80% level from 90%. So, I need to calculate the sample size (this takes into account two proportions, 90% and 80%):-
By assuming 80% power, 5% alpha risk, 219 per district is the required sample population. I assumedly take 30 clusters (the one intervention district has 200 villages i.e. clusters) as it is recommended by the WHO. So, 219/30=7.3 households per cluster (i.e. village). The primary sampling unit are the households and the proposal is to take three elements from each households (members of the family) to individually respond to the questionnaire. So, in total, 219 households and 219*3=657 individuals
Questions:
1. Is it OK to use this assumedly 30 clusters number?
2. How the required number of clusters could be estimated i.e. to take xx (?) clusters from 200 clusters (villages) of this intervention district as we don’t want to reach all 200 villages.
SECOND
Above approach has number of disadvantages: - that the change (i.e. effect size) is an assumption; no source that determine that the change in KAP due to intervention (effect size) should be by 11% and not by 5% or 20% or 30%. Secondly, after the study is done, the “actual” effect size obtained could be <11%; meaning that the calculated sample size was infact lower than it should have been (sample size inversely proportional to effect size) for the “real” effect of the intervention. This may make results biased and less reliable.
An approach that may counter above problems is:-
To conduct two prevalence of KAP surveys individually, before and after the intervention i.e. without assuming and pre-conditioning that our intervention would bring this much (XX %) amount of improvement in the KAP (i.e. 11% improvement, as above). After having two surveys conducted, then we estimate how much change has actually occurred. The sample size is determined by using “one proportion” and formula as below:-
The primary sampling unit will be household, not individuals. Within each household, three individuals will be invited to respond to the questionnaire--1 <18 years age person, 1 adult male, 1 adult female.
The formula shown below gives a maximum possible sample size of 210 households
N=2* 1.962 (p-(1-p))/d2
p=0.5; 1-p=0.5; d=0.1, totaling 192 PLUS 10% i.e. 211

Thus, dividing 211 with the required number of cluster (villages) according to standard, i.e. 30, would mean 7 households per village. From each household (n=211), three respondents (1 <18 years age, 1 adult male, 1 adult female) as mentioned above will be invited (no random selection) to respond to KAP (7 houses*3 from each house, 21 per village). This means a total of 630 respondents (21*30=630) in each district.

This approach has following advantages:-

No particular effect size of the intervention is pre-assumed before doing the survey while the maximum sample size that can be obtained, is taken (refer to formula above).

Possible disadvantages:
The formula used in the second approach doesn’t take into account “statistical power” but does take into account confidence interval. Power is safeguard again false negative and confidence interval is safeguard against false positive. Power is an important aspect that indicates that the probability of detecting an effect that exists in reality.
Questions:
1. Is this 2nd approach correct even if “power” aspect is not taken into account vis-à-vis our objective (before doing the 1st baseline survey) i.e. to determine the change in a parameter due to an intervention? For this objective (i.e. to determine the change in parameter due to an intervention) can be studied by doing two “independent” surveys (i.e. “one proportion” is used i.e. 90% in sample size calculation, formula as above in second approach) before and after the intervention?
N=2* 1.962 (p-(1-p))/d2
p=0.9 (because of 90%); d=0.1
*This formula doesn’t take into account two proportions i.e. before intervention 90% and after intervention 80%

2. Are there any ways by which lack of “power” in the calculation of sample size in the 2nd approach be compensated? Is absence of “power” (which is usually 80%) in the calculation formula of sample size a problem?
Power 80%

Statistical Power: Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – ß.
In plain English, statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

Mark Myatt

Consultant Epideomiologist

Frequent user

23 May 2014, 14:36

A lot of questions ... lest us start by addressing a few of them and see where that takes us ...

Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:

  Estimated sample size for two-sample comparison of percentages

  Test H:    p1 = p2, where p1 is the percentage in population 1
  and p2 is the percentage in population 2

  Assumptions:
          alpha =          5% (two-sided)
          power =         80%
             p1 =         90%
             p2 =         80%

  Estimated sample size:
             n1 =        199
             n2 =        199

So your sample size in each district should be about 398.

You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:

  Estimated sample size for two-sample comparison of percentages

  Test H:    p1 = p2, where p1 is the percentage in population 1
  and p2 is the percentage in population 2

  Assumptions:
          alpha =          5% (one-sided)
          power =         80%
             p1 =         90%
             p2 =         80%

  Estimated sample size:
             n1 =        157
             n2 =        157

Giving a sample size of about 314 in each group.

If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.

I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.

A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).

Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.

Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.

WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.

I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).

In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).

I hope this helps.

Please do not hesitate to ask follow-up questions.

Anonymous 2525

Normal user

24 May 2014, 06:28

Dear Friend

Thank you so much first of all; I must say that you are doing an excellent job to put sincere efforts and share your knowledge with those who are unknown to you! Please accept my sincere thanks for this!

I answer within your replies (para with ### are the answers):

A lot of questions ... lest us start by addressing a few of them and see where that takes us ...
Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages

Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2

Assumptions:
alpha = 5% (two-sided)
power = 80%
p1 = 90%
p2 = 80%

Estimated sample size:
n1 = 199
n2 = 199

So your sample size in each district should be about 398.

###I agree with the design effect. But do we add design effect even when there is two sample comparison? Does it not to be applied in only one sample i.e. e.g. like prevalence surveys where you don’t aim for a change from time 1 to time 2?

You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages

Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2

Assumptions:
alpha = 5% (one-sided)
power = 80%
p1 = 90%
p2 = 80%

Estimated sample size:
n1 = 157
n2 = 157

Giving a sample size of about 314 in each group.

###This is also a strong argument. But, is it still OK to take single-tailed hypothesis, even when it is possible that the mass-media may increase the knowledge of the population; or may reduce (strange!) or may not have any effect at all from the baseline level?

If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.
I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.

###Thank you again for this good argument as well. But we are testing not just knowledge; we are also testing a strategy for treatment coverage as well, so I guess two districts would be taken. We also need to know what level of knowledge control district has.

A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).

###As for I understood, according to you, there is no specific mathematical formula to calculate the number of clusters; it just depends on convenience and desirable cluster size? I have seen surveys that have taken >40 or >50 surveys, do they also not use any particular mathematical formula and decide on number of clusters as per convenience and desirable cluster size? 30 clusters are generally for immunization programs, generally, and our program is on neurology, would that be no issue?

24812=288, and not 264

Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.

###Yes, I would use PPS; and plan to sample households by simple random sampling, what do you think?. I would read SMART survey manual as well. Thank you for this link.

Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.
WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.

###sorry, but I didn't understand "If you knew this in advance then you would not need to do a survey."

I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).

###Thank you. Since we employ radio for mass-media, don’t you think that we wouldn’t need to go to the same participants, as those of the baseline survey, for the final-line survey? Meaning, a random sample of public for baseline and then another random sample of public for final-line, participants may or may not be same?

In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).
I hope this helps.
Please do not hesitate to ask follow-up questions.

###Thank you so much for your time and patience. Also, I would be glad if you may share your insights on the second approach I had mentioned in my original message:-

Since assuming a change from 90% to 80% is only an assumption and something may not be obtained in reality. Could these limitations in the two-sample comparison of proportions be overcome by doing two independent surveys (that use the formula as below that gives maximum sample size and we do not assume anything that our intervention mass-media would bring this much of improvement) that takes into account 95% CI but doesn’t take into account statistical power (80% or 90/95% power). Could this also be one of the alternative methods according to you? Project cost is not a limitation factor.

Since power is an important component to show that the difference existed in reality; below formulas and above second approach doesn’t take into account this power aspect. Power could be estimated after two surveys (baseline and final-line) have been conducted but not pre-assumed.

N=2* 1.962 (p-(1-p))/d2 (for survey baseline)
p=0.9 (because of 90%); d=0.1

N=2* 1.962 (p-(1-p))/d2 (for survey final-line)
p=0.8 (because of 80% or whatever is obtained in baseline survey); d=0.1

Best regards!

Mark Myatt

Consultant Epideomiologist

Frequent user

25 May 2014, 11:24

Thank you for your kind comments.

Deign effect : This can be confusing as we tend to use the term "design effect" (DEFF) for simple surveys and "variance inflation factor" (VIF) for studies and trials (particularly cluster randomised trials). DEFF and VIF are similar things and account of the loss of variance caused by a cluster sampled design. Some statistical procedures are quite robust to loss of sampling variation but most are not. This means that you should almost always increase sample sizes to account for variance loss whenever you use a clustered sample. It also means that you should use appropriate statistical techniques with such data. This is because many statistic tests and all procedures for calculating confidence intervals rely on estimating variance (and this is usually underestimated when you have a cluster sample).

Single-tailed test : This is appropriate is you expect or are interested in an effect in a particular direction. You only expect a decrease (improvement) associated with your intervention so a single-tailed test would be appropriate.

Secular trend : If you expect your indicator to also go down in the control area then a two-sample test is required. The risk is that the prevalence of stigma drops to 80% in both areas and you see this in the intervention area and ascribe the change to the intervention when it would have happend anyway.

Bleed : One bias that you need to look out for is "bleed". this is where the effect of an intervention (particularly and information-based intervention) bleeds over into the non-intervention area. When this happens you will see a smaller (or no) effect as the intervention is present in both areas. This is a particular problem with mass-media campaigns.

Numbers of clusters : There are no (AFAIK) formal ways of getting at the exact number of clusters required since variance loss is influenced by the underlying clumpiness of a variable, the number of clusters, the sample design used in clusters, and other things. The general rule is to take as many small clusters as is feasible. I tend to use a map-segment-sample method for sampling within clusters. The methods in the latest SMART material are useful. The main idea is to try not to lose sampling variation so try to sample from all over the cluster.

Arithmetic : Yes. 24 * 12 is 288!

Statistical tests vs. CIs : Any approach that (e.g.) has a null value (90%) and then estimates a value and states that if the 95% CI does not include the null value then we have made a difference is doing a statistical test. Sample size and power issues are still there as I can have a small sample size with a wide CI that will seldon exclude the null value (lots of fall negatives) or a large sample size that gives lots of false positives even when there is very little effect. The estimation approach is preferred because it gives an effect size (how big a difference) rather than a p-value which reflect sample sise and effect size (in very large samples almost any difference will have p < 0.05)

Are we getting there?

Anonymous 2525

Normal user

26 May 2014, 14:51

Thank you so much again for all your help and timely answers!, I have no pending queries now, I have got the answers I was looking for, Thank you so much!

Just to be sure on one last para "Statistical tests vs. CIs", as far as I have understood, you do not support the second approach that I had mentioned (two prevalence surveys without assuming that a particular effect size would be obtained, 10% in our example), but prefer to pre-assume that an intervention would bring x% of change (10% in our example) and then according to this assumption, we calculate the sample size. Have I understood correctly?

Nonetheless, thank you so much for all your help!, Best regards.

Mark Myatt

Consultant Epideomiologist

Frequent user

27 May 2014, 07:50

The point I was trying to make is that the two approaches are equivalent to each other. The only difference between them is the mechanics of the testing. With two surveys you will still want to ask whether the two prevalences differ from each other and you will then fall back on a significance test and that is where we started from.

Best, IMO, to decide what is the smallest effect worth detecting and then calculate the sample size sufficient to detect that with acceptable levels of error. The sample size calculation (and the sample size required) for either approach will be the same.

Anonymous 2525

Normal user

4 Jun 2014, 05:34

Thank you so much, sir. Sorry, I misunderstood initially your point. Best regards!!

Back to top

» Post a reply