# sample size diagnostic questionnaite-urgent help

This question was posted the Assessment and Surveillance forum area and has 1 replies.

### Anonymous 2525

Normal user

29 Jan 2015, 06:21

### Mark Myatt

Consultamt Epidemiologist

Frequent user

29 Jan 2015, 10:50

```
Specialist + Specialist -
------------ ------------
Questionnaire + a b
Questionnaire - c d
------------ ------------
a + c b + d
------------ ------------
```

Sensitivity is :
```
a / (a + c)
```

Specificity is :
```
b / (b + d)
```

You want to estimate sensitivity and specificity with reasonable precision.
Both sensitivity and specificity are proportions:
```
numerator denominator
--------- -----------
Sensitivity a a + c
Specificity b b + d
--------- -----------
```

There are, therefore, two sample sizes to consider (i.e. "a + c" and "b + d").
The required sample sizes will depend on the proportion to be estimated and the precision required. It is usual to assume that both sensitivity and specificity will be considerably above 50% (since we hope the instrument will perform better than just tossing a coin). In this example I use 80% sensitivity and 90% specificity both to be estimated with a precision of +/- 5% :
```
Target Precision
------ ---------
Sensitivity 80% +/- 5%
Specificity 90% +/- 5%
------ ---------
```

A Standard sample size formula is used:
```
n(sensitivity) = (0.8 * (1 - 0.8)) / (0.05/ 1.96)^2 = 246
n(specificity) = (0.9 * (1 - 0.9)) / (0.05/ 1.96)^2 = 139
```

To obtain the overall sample size in a single sample the following conditions must be satisfied:
```
(a + c) >= 246
(b + d) >= 139
```

Note that (a + c) is the number of TRUE POSITIVES and that (b + d) is the number of TRUE NEGATIVES. In your problem "truth" is decide by the specialist.
The cheapest approach to achieving the sample is to select the first 246 patients diagnosed as POSITIVE by the specialist(s) and the first 139 diagnosed as NEGATIVE by the specialist(s). Prevalence is often high in specialist clinics so you may have to wait some time for the 139 negatives. If this is the case the an expedient measure is to take a sample of (assumed) negatives from another clinic which you will probably want to match on potential confounders and known risk factors (e.g. age, sex, SE status, ethnicity).
A simple population sampling based approach will be expensive with a low prevalence condition. You quote a prevalence of 5.5 / 1000 which (expressed as a proportion) is 0.0055 (0.55%). You would need a sample of about:
```
(1 / 0.0055) * 246 = 44728
```

to find 246 POSITIVE cases. This is impossible within reasonable resource constraints.
NOTES :
(1) The sample size calculation is based on guesses of values which are unknown at this time. It is prudent, therefore, to increase the calculated sample sizes slightly.
(2) All study subjects must be screened by the questionnaire and the specialist(s) who should be blind to the results of the other method (i.e. the specialist should not know the questionnaire result and vice-versa).
(3) You will need to consider ethical aspects of the trial. All cases found must be treated.
As usual ... check my arithmetic!
I hope this is of help.
If you have any problem posting a response, please contact the moderator at post@en-net.org.