Greetings, How do you effectively address the issue of confounding factors in a survey to ensure that at the end of the study we can clearly relate the dependent and independent variables alone?

For example, a study of "Effect of socio- economic factors and child care practises on nutrition status of children 0-23 months of age in population X"

For socio-economic factors I am looking at demographic characteristics and occupation and the likes. For child care practises I am looking at the breastfeeding and complementary indicators, WASH practises.

So one of my possible confounding factors might be Health status of the children at least two weeks prior to the survey that might impact the nutrition status too

How can I address this?

One approach to dealing with this is in the sample design. Matching is a key technique. The problem is that matching can complicate sampling and be difficult to do at all well. Surveys usually do not involve taking matched samples so we tend to use analytical technique to address these questions. Stratified analysis (e.g. Mantel-Haenszel analysis) used to be very commonly used. This is not a difficult analysis and it works well but it has some limitations ... (1) The method become laborious when there is more than one confounder and requires a large sample size to avoid vanishingly small samples sizes in soem combinations of strata, and (2) continuous confonders need to be converted into a limited number of categories and this can be problematic when too few categories are used and getting a large numbers of categories usually need a large sample size. The availability of fast and powerful computers allows us to use multivariable regression methods to address these questions. I most often use multiple logistic regression in these cases. I usually start by looking at pairwise associations between risk factors / potential confounders and the outcome of interest and select only statistically significant associations (often with a relaxed criterion such a p < 0.10). I then use a stepwise method to build the regression model ... I usually fit a model including all significant associations  and remove the least significant association from the model and refit the model. This process of stepwise elimination continues until the are no non-significant variables to be removed from the model. This final model can be considered to contain only independent associations ... this is what we want as the removed associations were due to confounding. At each stage we aim to simplify the model. This is a powerful technique but there are some "gotchas" to avoid by keeping the models simple. Missing values can make large models unreliable. Continuous and categorical variables cane be used. Yes / No variables should be coded 1/0. A natural outcome of the analysis is the odds-ratio and is interpreted as the change in odds associated with a unit change in a variable (for binary (1/0) variables this is the effect of exposure, forn continuous variables is is the effect of a unit change).

I hope this is of some use to you. I have training material showing stratified analysis and logistic regression with R. Let me know (here or by e-mail) if you'd like to get the manual and datasets.

Mark Myatt
Technical Expert

2 years ago

@ Mark Myatt, Thanks for the response very informative. I have sent you an email, I will be very grateful if you forward to me the manuals and data sets. Regards.

Anonymous_A_W_40

2 years ago

OK. I have put the handbook (prfe.pdf) and the su[pprtig data files in this ZIP archive

Exercises 2, 3, and 4 cover stratified analysis and logistic regression. If you want to avoid the fuss of writing functions then you can source() the prfe.r file to load the functions developed for the early exercises in the handbook. That might save you some time and work.

I hope this is helpful. Let me know if you have any problems with this.

Mark Myatt
Technical Expert