# Using nutrition surveys data for secondary analysis?

This question was posted the Assessment and Surveillance forum area and has 10 replies. You can also reply via email – be sure to leave the subject unchanged.

### Tariq Khan

Normal user

19 Aug 2014, 09:54

Hi all,

I would like to know how better we can utilize the past nutrition surveys for secondary analysis. The datasets have IYCF, Health, Anthropometric, dietary and socio-economic data, related to pregnant and lactating and under five.

Your suggestions are highly appreciated.

Thanks.

T.

### Mark Myatt

Frequent user

19 Aug 2014, 10:45

It depends what you have. If you surveys spread over time then you can analyse data as a time series. If you have data spread over space then you can map. If you have data spread over time and space then you can map several time series. You could also do a causal analysis for MAM and SAM (the big sample size of combined datasets should provide reasonable power) this will help assigne attributable fractions to risk factor and inform intervention priorities. A database like this is often useful. I have (e.g.) used a simpler database for exploring screening thresholds, the effect on case-loads of changing case-definitions, &c.

Any help?

### Mark Myatt

Frequent user

26 Aug 2014, 11:58

I am not sure what you are asking here. Forgive me if I misunderstand.

You calculated a sample size with a precision of (e.g.) +/- 5% (95% CI) and find a precision of +/- 10% when you analyse the data. This can be due to a few reasons:

(1) You did not achieve the planned sample size.

(2) You have a lot of missing values (or erroneous values set to missing). The sample size is the number of non-missing values.

(3) You specified a very different prevalence in the sample size calculation. Prevalence of 50% has the most variance (least precision).

(4) You sue a complex sample design and the design effect was larger than expected. This is often the case with highly clustered phenomena.

Is this any help?

### Kevin Sullivan

Professor

Normal user

26 Aug 2014, 13:58

I think it is acceptable to perform additional analyses on cross-sectional nutrition surveys. Frequently only descriptive malnutrition results are provided by socio-demographic factors, such as age and urban/rural status. Further investigation could be performed to identify multiple factors associated with a specific outcome, such as stunting - assessing if there are factors that modify or confound associations. The one cautionary note is that survey generally identifies prevalent conditions, therefore one cannot use terms such as "risk factors".

### Mark Myatt

Frequent user

26 Aug 2014, 14:25

I am not sure about Kevin's point about risk factors since case-control studies often investigate risk factors using prevalent cases. I think the main issue with using cross-sectional surveys for this type of analysis is one of power. With prospective studies you get to manipulate exposures (or at least the size of the expose group). With retrospective studies you get to choose the size of the case and control groups. With cross-sectional study you get what you are given. If you are looking at (e.g.) SAM you may have very few cases of SAM in your sample. This will limit the power of the study to detect associations and may complicate data analysis in other ways. There are ways around this. You could (e.g.) choose to analyse anthropometric indicators as continuous variables so that you are investigating factors associated with low or high MUAC.

Another issue is that your risk factors would be collected as (e.g.) monitoring and evaluation indicators and may not be comprehensive as a set of risk factors or include potential confounders. They will, however, have the advantage of being easily collectable and related to program modes.

A second note of caution (to follow Kevin's) is that you when you combine survey datasets you will have quite a complex sample design. This may also be the case when you analyse a single cluster-sample survey. Your analysis needs to take the sample design into account.

I hope this helps.

### Mark Myatt

Frequent user

27 Aug 2014, 09:23

WRT Precision and sample size - Sample size are always based on guesses. Typically we need to specify an expected prevalence, a desired precision, and a design effect. If we knew the prevalence we would not need to do the survey. The design effect is even more unknowable. Usually we can get pretty close to what we wanted. Rules of thumb are:

(1) If you have no idea of the prevalence then specify 50%

(2) Specify a useful precision on a 50% estimate (something like +/- 5% is common). Assuming a simple random sample This gives:

n = (0.5 * (1 - 0.5)) / (0.05 / 1.96)^2 = 384

if prevalence is (e.g.) estimated to be 15% then approximate precision would be:

+/- 1.96 * sqrt((0.15 * (1 - 0.15))/384) = +/- 0.0357 = 3.57%

(3) Specify a minimum design effect of 2.0. You can go lower if you have good reason to belief that there will not be much spatial clustering ... we tend to do this with some anthropometric indicators. Go higher if you suspect clustering. With communicable disease such as trachoma I have seen design effects of 7.0 or higher. Cover tends to cluster spatially and will require larger design effects.

WRT Design effects - Most statistical packages provide tools for handling complex samples. They often use the same methods but use different terminologies for the same things. Best to work through some example in manuals and seek advice. If matters are really complicated then you may want to analyse each survey separately and then do a meta-analysis baed on estimates and CIs and sample size in each survey. A forest plot might be very informative.

I hope this is of use.

### Kevin Sullivan

Professor

Normal user

27 Aug 2014, 14:45

I think it is important to be careful about terminology and most epidemiologic textbooks draw clear distinctions between the terms risk, rate, and prevalence. In general, randomized controlled trials and cohort studies assess incidence (risk and/or rate) of disease, and cross-sectional studies assess prevalence. Prevalence is a function of two factors - the incidence (risk or rate) of disease AND the average duration of disease. Since cross-sectional studies primarily address prevalence (proportion of a population with a condition, such as stunting), it would be incorrect to analyze the data and use the term "risk" - e.g., the "risk" of stunting was higher in males compared to females. It would be correct to state that the prevalence of stunting was higher in males than in females. I would highly recommend the following article: Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) at www.strobe-statement.org

### Mark Myatt

Frequent user

27 Aug 2014, 16:54

I see your point now.

I think we should note ...

(1) The prevalence ratio (PR) and the risk ratio (RR) are the same when the average duration of disease is the same in the exposed and non-exposed groups and the presence of the outcome does not influence the presence of the exposure (i.e. the exposure must precede the outcome). If the first condition is not met then the PR may be biased with respect to the RR and some estimate of the direction and magnitude of the bias should be made.

(2) The estimation (i.e. point estimate its standard error) procedures for prevalence ratio and the risk ratio are identical. This means that (provided the conditions in (1) above are met) you can use standard statistical packages to estimate the prevalence ratio (e.g. RR and CI from a 2-by-2 table).

BTW ... here is a link to the full text of the article.

I'll be more careful with terminology now.

Thanks Kevin.