Language: English Français

# Confounding factors

This question was posted the Assessment and Surveillance forum area and has 4 replies.

### Mark Myatt

Frequent user

26 May 2022, 10:59

One approach to dealing with this is in the sample design. Matching is a key technique. The problem is that matching can complicate sampling and be difficult to do at all well. Surveys usually do not involve taking matched samples so we tend to use analytical technique to address these questions. Stratified analysis (e.g. Mantel-Haenszel analysis) used to be very commonly used. This is not a difficult analysis and it works well but it has some limitations ... (1) The method become laborious when there is more than one confounder and requires a large sample size to avoid vanishingly small samples sizes in soem combinations of strata, and (2) continuous confonders need to be converted into a limited number of categories and this can be problematic when too few categories are used and getting a large numbers of categories usually need a large sample size. The availability of fast and powerful computers allows us to use multivariable regression methods to address these questions. I most often use multiple logistic regression in these cases. I usually start by looking at pairwise associations between risk factors / potential confounders and the outcome of interest and select only statistically significant associations (often with a relaxed criterion such a p < 0.10). I then use a stepwise method to build the regression model ... I usually fit a model including all significant associations  and remove the least significant association from the model and refit the model. This process of stepwise elimination continues until the are no non-significant variables to be removed from the model. This final model can be considered to contain only independent associations ... this is what we want as the removed associations were due to confounding. At each stage we aim to simplify the model. This is a powerful technique but there are some "gotchas" to avoid by keeping the models simple. Missing values can make large models unreliable. Continuous and categorical variables cane be used. Yes / No variables should be coded 1/0. A natural outcome of the analysis is the odds-ratio and is interpreted as the change in odds associated with a unit change in a variable (for binary (1/0) variables this is the effect of exposure, forn continuous variables is is the effect of a unit change).

I hope this is of some use to you. I have training material showing stratified analysis and logistic regression with R. Let me know (here or by e-mail) if you'd like to get the manual and datasets.

### Mark Myatt

Frequent user

27 May 2022, 13:37

OK. I have put the handbook (prfe.pdf) and the su[pprtig data files in this ZIP archive

Exercises 2, 3, and 4 cover stratified analysis and logistic regression. If you want to avoid the fuss of writing functions then you can source() the prfe.r file to load the functions developed for the early exercises in the handbook. That might save you some time and work.

I hope this is helpful. Let me know if you have any problems with this.