# Confidence Intervals for Complex Sampling

This question was posted the Assessment and Surveillance forum area and has 5 replies. You can also reply via email – be sure to leave the subject unchanged.

### Ranjith

Normal user

17 May 2012, 13:29

### Kevin Sullivan

Normal user

17 May 2012, 16:08

### Mark Myatt

Frequent user

18 May 2012, 08:38

### Mark Myatt

Frequent user

23 May 2012, 09:45

### Mark Myatt

Frequent user

16 Jul 2012, 10:03

**Cluster / PPS :**Point estimates (odds ratios, risk ratio, means, &c.) derived from PPS cluster samples samples will be the same as calculated as if the data came from a simple random sample. The confidence interval around the estimate will not be the same (it will usually be wider). This is due to loss of sampling variation. It is possible to reduce this loss by careful sample design (i.e. increasing the number of clusters and / or using a within-cluster sampling scheme that helps to maintain sampling variation) although there will be a point at which cost-savings (the main reason for cluster sampling) are lost.

**Stratified sample :**Point estimates will usually be different from those calculated as if the data came from a simple random sample. This is because stratum-specific results must be weighted by some function of stratum population before being combined to form an overall estimate. The confidence intervals around the estimate will not be the same (it will usually be narrower). In the case of hypothesis testing (e.g. Chi-square tests), most testing procedures are not optimal when data are autocorrelated. This is often the case with complex samples. Errors associated with a test may be different from specified (i.e. p < 0.05 may not be p < 0.05). There are special cases such as a chi-square test for twinned observations (e.g. one person has two eyes) - you may be lucky to find a special case that applies. here are a number of approaches to dealing with this problem. The most common is, probably, to ignore it and treat the data as if from a simple random sample. One approach (modelling) uses procedures to correct for correlation. These procedures vary in complexity and between the test being used. Another approach is to use resampling approaches (e.g. the bootstrap). The resampling approach is consistent and simple and works well in most cases. Both modelling and resampling require familiarity to do properly. It is probably easier to recast a hypothesis testing problem as an estimation problem. Most problems are amenable to this apprach. For example, the difference between two proportions (chi-square test commonly used) may be recast as a risk ratio (or odds ratio) with a 95% CI (90%) for a single sided test) that does not include zero. You can do this sort of analysis with (e.g.) CSAMPLE. I hope this is of some use.