Language: English Français

# Urgent Help!! Please!! Sampling and study design!!!

This question was posted the Assessment and Surveillance forum area and has 6 replies.

### Mark Myatt

Frequent user

23 May 2014, 14:36

A lot of questions ... lest us start by addressing a few of them and see where that takes us ... Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get: ``` Estimated sample size for two-sample comparison of percentages Test H: p1 = p2, where p1 is the percentage in population 1 and p2 is the percentage in population 2 Assumptions: alpha = 5% (two-sided) power = 80% p1 = 90% p2 = 80% Estimated sample size: n1 = 199 n2 = 199 ``` So your sample size in each district should be about 398. You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get: ``` Estimated sample size for two-sample comparison of percentages Test H: p1 = p2, where p1 is the percentage in population 1 and p2 is the percentage in population 2 Assumptions: alpha = 5% (one-sided) power = 80% p1 = 90% p2 = 80% Estimated sample size: n1 = 157 n2 = 157 ``` Giving a sample size of about 314 in each group. If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only. I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district. A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough). Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better. Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c. WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that. I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple). In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach). I hope this helps. Please do not hesitate to ask follow-up questions.

### Mark Myatt

Frequent user

25 May 2014, 11:24

1. Nutrition Surveillance Specialist

We are looking for a consultant to support the countries of the region in their planning and implementation of nutritional surveys. Also, the work includes an important aspect of collaboration with regional platforms such as the Cadre Harmonisé.

2. Nutrition Routine information

We are looking for a consultant to support countries in the region in their routine nutrition information systems. The person should have advanced knowledge of DHIS2 software.

### Mark Myatt

Frequent user

27 May 2014, 07:50

The point I was trying to make is that the two approaches are equivalent to each other. The only difference between them is the mechanics of the testing. With two surveys you will still want to ask whether the two prevalences differ from each other and you will then fall back on a significance test and that is where we started from. Best, IMO, to decide what is the smallest effect worth detecting and then calculate the sample size sufficient to detect that with acceptable levels of error. The sample size calculation (and the sample size required) for either approach will be the same.