# Estimating prevalence using health records

This question was posted the Assessment forum area and has 2 replies. You can also reply via email – be sure to leave the subject unchanged.

### Sylvie

Programme officer

Normal user

15 Sep 2014, 13:53

We want to estimate the prevalence of malnutrition in TB patients using health records from different clinics.

The initial plan was to extract the information from health records of all patients who were diagnosed with TB in the past six months. Now we think this may be too much work and would like to reduce the number of records. Shall we reduce the time period that we are looking at to only one or two months or is there some kind of sample size calculation that we need to apply?

If we want to compare the prevalence of malnutrition in TB+ and TB- patients who came to the clinics in the past six months, do we need to calculate the minimum required sample size even if we will enter information for all patients?

### Pascale Delchevalerie

Nutrition Advisor MSF Belgium

Normal user

16 Sep 2014, 07:33

Hi Sylvie,

As malnutrition prevalence can be influenced by seasonality, if your recall period is less than one year, I would consider your selection of patients as a sample and use the usual regulations for sample representativeness: sample size (simple random sampling in your case), the minimal size depending of the desired precision.

n=1.96(2) x (pxq) / d(2) where

1.96 = statistical parameter (t) fixed for a 5% risk of error.

(2) = squared

p = the expected prevalence of malnutrition (if prevalence is unknown, let p = 0,5)

q = (1-p) = the expected prevalence of non-malnutrition

d = desired precision

If you are in a context where prevalence of malnutrition is unstable, I would take part of the sample during the low season and part during the peak season.

I hope this help

Pascale

### Mark Myatt

Consultant Epideomiologist

Frequent user

16 Sep 2014, 09:02

A simple approach to working with clinical data is to start on a specific date and then take a systematic sample of admissions (e.g. every admission, every other admission, every third admission). The start date and the sampling interval are chosen with reference to a required sample size and (in some cases) a desire to take a sample over an entire year (e.g. to capture seasonal effects). This type of sample usually works well but you need to take care that the sampling interval does not correspond to a structure in the population so that you end up with (e.g.) an all male or all female sample. If you have worries about this then a simple random sample could be taken. This is only a little more work.

Your problem is complicated by having multiple clinics. If you want to pool data across the clinics then you could take a proportional sample (i.e. bigger clinics contribute a bigger part of the overall sample) or take a fixed size sample from each clinic and weight the sample during data analysis. Since you will have complex design it may be best to increase the sample size over that required for a simple random sample. It is common to double the sample size.

The sample size calculation for a single prevalence estimate and a simple ramdom or systematic sample is:

n = (p * (1 - p)) / (precision / 1.96)^2

so .. for a prevalence (p) of 50% and a precision (i.e. half-width of 95% CI) of 10% you have:

n = (p * (1 - p)) / (precision / 1.96)^2 n = (0.5 * (1 - 0.5)) / (0.1 / 1.96)^2 n = 96

with your complex design you might multiply this by 2.0.

With a complex design you will need to analyse the data using techniques that can account for the sampel design. These are available in most statistical packages.

If you plan to enter data for all patients then you do not need to do a sample size calculation because you cannot sample any more than everyone. This may be wasteful.

There are a number of sample size calculations to compare two proportions. Let us know here if you need help or would like someone to review your calculations.

IMPORTANT (1) : You will need to take care that your TB+ and TB- patients are comparable in all regards except their TB status. If your cohort of TB+ patients are mostly aged > 50 then you should not take your TB- from (e.g.) a paediatric clinic or a maternity ward. You should (e.g) take them from a group of patients of comparable age and sex.

IMPORTANT (2) : You need to pay attention to your definition of malnutrition. BMI can be problematic in chronic diseases and in elderly patients due to difficulties in measuring height.

I hope this helps.