Language: English Français

# Nutrition survey analysis- Associations

This question was posted the Assessment and Surveillance forum area and has 13 replies.

### Kevin Sullivan

Professor

Normal user

12 Apr 2011, 15:10

Here is some info: Epi Info, using the "Complex Sample ..." commands, can account for survey stratification, clusters, and sample weights. For a 2x2 table, it can provide the prevalence odds ratio, prevalence ratio, and prevalence difference with confidence limits and DEFF. It does not provide a p-value. I do have a spreadsheet that calculates the wald statistics that can be used to derive a p-value - send me an e-mail and I will send it to you. SAS can do these analysis with it's survey PROCS - PROC SURVEYFREQ, PROC SURVEYLOGISTIC. SPSS can do analyze survey data with the optional Complex Samples Module. Stata can handle complex survey data as can R.

### Anonymous 81

Public Health Nutritionist

Normal user

12 Apr 2011, 15:36

Dear Kevin Sullivan, I was wondering if you share me the spreadsheet.

### Kevin Sullivan

Professor

Normal user

12 Apr 2011, 20:09

Hi Kiross - I just e-mailed it to you. Kevin

### Mark Myatt

Frequent user

13 Apr 2011, 12:14

Be aware that this sort of analysis is likely to be very low powered compared to (e.g.) a case-control study because (1) you often end up with small numbers in table cells because the outcome is relatively rare and the exposure may also be rare, and (2) because the sample design reduces the effective sample size. This analysis will be limited to finding the largest effects. This is a common problem with analysing data from cross-sectional surveys as retrospective cohorts.

### Mark Myatt

Frequent user

13 Apr 2011, 13:43

The cell counts >= 5 rule is a technical issue regarding assumptions behind using ch-square tests. With cell counts < 5 you can use something like the Fisher-Irwin test which is based on exact hypergeometric probability rather than normal approximations to the binomial. It is more a sample size and power issue. Low frequencies of outcomes and exposures means that you may will have varying and often low statistical power. You can do the analysis but it may not find weak or moderate strength associations.

### Anonymous 402

Normal user

16 Aug 2012, 19:46

Is it correct to say that it is OK to use the 'Table' command under 'Statistics' in EPI INFO to run chi squared test and get results for a cluster survey as the survey design does not affect this test results here?

Self-employed

Technical expert

16 Aug 2012, 20:51

No, it is not correct to use the chi square from the normal "Tables" command in EpiInfo to judge the statistical significance of a difference in some outcome between two or more subgroups of the survey sample if the sampling included cluster sampling. Chi square is meant to tell you the likelihood that a difference between subgroups in the survey sample has occurred solely as a result of sampling error; that is, there is no real difference in the population surveyed. Any measure of sampling error or any measure involving sampling error, such as confidence intervals or p values, will be affected by the increase in variance induced by cluster sampling compared to simple random sampling. The "Tables" command assumes simple random sampling. As a result, the chi square from the "Tables" command will underestimate the variance and p value, and therefore overestimate your confidence that there is a real difference in the population, not just a difference due to sampling error. This could lead to incorrect conclusions. In EpiInfo, you must use the command "Complex Sample Tables" and specify the variable containing the codes for PSU and, if applicable, the variables containing the statistical weights and codes identifying the strata.

### Mark Myatt

Frequent user

16 Aug 2012, 21:40

You can use chi-square tests with cluster sampled data but these must be corrected for the sample design. An uncorrected test would, with design effects above one, be likely to make a type I error (i.e. incorrectly rejecting the null hypothesis) more frequently than expected. Common software (e.g. EpiInfo, STATA) provide methods to correct chi-square tests. You must specify complex sampling procedures ... if you do not do this then the software reports the standard (uncorrected) test. Be sure to specify the sample design correctly. This will, for a SMART type survey, be simply a matter of telling the software the variable that identifies the cluster. It can be quite complicated to specify the design for more complex designs. I hope this is of some use.

### Anonymous 402

Normal user

16 Aug 2012, 22:20

Thank you very much for the clarifications. The problem is that Complex Sample Tables in EPI INFO does not provide chi-square test results (only provide risk ratio, odds ratio, and risk difference). All other common software are quite expensive! Thank you again.

Self-employed

Technical expert

16 Aug 2012, 22:42

### Anonymous 402

Normal user

16 Aug 2012, 23:38

Thank you. I didn't know that you could chi square test in EPI 6. I will give it a try while also send an email to EPINFO helpdesk.(if chi square test is available in EPI 6, why was it taken out in Windows version? I wonder...) Perhaps other users, especially those who are using EPI-ENA to analyse non-anthropometric data could also send similar requests to urge CDC to add this feature?

### Mark Myatt

Frequent user

17 Aug 2012, 10:38

A few things ... There is very little difference between using the CI on the risk ratio or the risk difference as an hypotheses test and using a chi-square test with a fixed p-value for rejection of the null (i.e. p < 0.05). The test is that the CI on the risk ratio does not include one or that the CI on the risk difference does not include zero. I am not a Windows user but I do sometimes run EpiInfo v6.04d on Mac OS-X and BSD UNIX using a utility called DOSBOX. You can get DOSBOX for Windows for free from this site. R provides complex survey sample analysis through the "survey" package. This works very like the "svy" commands in STATA. The package has been tested against SAS SUDAAN, STATA, SPSS, &c. and gives the same results with benchmark datasets. R is an extremely powerful programming language (based on S and S-Plus) with a rather steep learning curve. I have a (slightly dated) introduction here. This tutorial does not cover the use of the "survey" package but the first 20 or so pages should be enough to get you working with R. These software all take a model-based approach to the problem. It is possible to use resampling techniques (e.g. the bootstrap) to address this problem. With a PPS cluster sample you would have to use a "blocking" approach to creating replicates with the blocks being individual clusters sampled with replacement. The "p-value" would, for a positive association, be something like: p = (number of replicates with RR <= 1) / (number of replicates + 1) R can also be used to implement these methods. We use a block bootstrap in the PSM survey method (it gives the same results as CSAMPLE in EpiInfo v6.04d). We use a weighted block bootstrap in the S3M survey method and in the RAM method. The advantage of the bootstrap is that it can be used for test statistics, such as a CI on the difference between two medians, that cannot be used with classical approaches.