# Using stage 2 data into the likelihood

This question was posted the Coverage assessment forum area and has 6 replies. You can also reply via email – be sure to leave the subject unchanged.

### Jose Luis Alvarez Moran

ACF Senior Technical Advisor

Normal user

25 Sep 2012, 17:32

My question is, should children found in stage 2 small area survey of a SQUEAC evaluation be used in the calculation of the likelihood stage 3? or only data from the big area survey should be used on stage 3?

I know that different things have been made in the past, but again we have a SQUEAC where we will not arrived to our sample size (61 SAM cases) and I'd like to have a few opinions about it.

Thanks a lot

### Ernest Guevarra

Valid International

Technical expert

26 Sep 2012, 00:04

I won't directly answer your question on whether we should use data from stage 2 small area survey and include it in the likelihood but instead try to ask you back some questions to see whether there is even a need to worry about "topping up" your likelihood data.

1) What are the specifications of your prior (i.e. estimate and alpha and beta shape parameters)?

2) How much sample size do you have now or are expecting to find?

In terms of sample size, I think that your target of 61 is quite big for a SQUEAC. With that number of sample, you can already classify coverage quite confidently with your alpha and beta errors being very small. In general, a sample size of 40 is big enough to classify coverage.

Another issue to consider is if indeed you are finding it challenging to find the 61 target sample you have set, a possibility is that there is just not that much cases to begin with. Hence, a sample size of 61 or a number close to it would already be of a high sampling proportion of the total number of SAM cases assuming that your case finding sensitivity is high (c. 90% or more).

Without having seen your prior, I think that if you are going to find about 40 SAM cases for your likelihood you will be fine and should not worry about "topping up" with small area survey data.

### Lio

CMAM advisor

Technical expert

26 Sep 2012, 04:53

The answer to you your question is « no », you should not use data from the small area. Villages for the wide area (to build the likelihood) are chosen using a method producing a representative sample of all villages of the area; contrary villages chosen for the small area are "purposely" chosen to test hypothesis (high coverage/low coverage). However, if the same villages were again selected during the wide area selection (very unlikely unless your area has very few villages), in this case you could use the data. But do you really need to increase the sample size? Ernest already answered to your question.

### Mark Myatt

Consultant Epideomiologist

Frequent user

26 Sep 2012, 09:55

The Bayesian approach is about updating existing information (i.e. the prior) with NEW information (the likelihood) to create a summary of both sets of information (the posterior).

The data collected in stage 2 (small surveys, small studies, and small area surveys) will have informed the definition of the prior. This means that you have already used this data. It is “wrapped up” in the prior. Stage 2 data is, at stage 3, existing information and should not be included as likelihood survey data.

It is difficult to advise you on sample size without knowing you prior and how you arrived at it but n = 61 is a large sample size for a SQUEAC likelihood survey. I seldom use a sample size larger than n = 35 and often use a smaller sample size than this.

I think it would be useful if you would explain the rational for needing such a large sample size.

### Mark Myatt

Consultant Epideomiologist

Frequent user

26 Sep 2012, 15:57

First - Please point us to discrepancies on-line. They should not be in the FANTA-III draft technical reference for SQUEAC (if they are then we will fix that ASAP).

A Beta(4.0, 8.7) prior is not very strong. This is compatible with a range of coverage that you believe to be compatible with your current knowledge of between 10% - 60% and a best guess for coverage at about 28%. If you have a better idea than this (that is quite a broad range of coverage) then you may want to reconsider the prior and specify a narrower range. This will reduce the required likelihood sample size. NOTE : Do not do this just to reduce the likelihood sample size - the prior should reflect what you know.

BTW : It is very good to see a not overly-strong prior being used. I fear that some people may be tempted (for various reasons) to specify strong priors with an optimistically high modes.

The specified precision is, I think, a little high. It is very unlikely that you will ever need a precision better than this (this is the precision available from EPI coverage surveys). I usually go for less precision (i.e. 12%, .., up to 15% sometimes higher). Unless prevalence of SAM is very high the specified (and resulting) precision will be better than specified (calculated) because the sampling proportion will be high.

The minimum sample size when using the BayesSQUEAC calculator is:

n = Prior Alpha + Prior Beta - 2 n = 4.0 + 8.7 - 2 n = 10.7 = 11 (rounded up)

This is usually large enough to ensure that the likelihood data will be able to correct a poorly specified prior and (provided there is no prior - likelihood conflict) reduce the uncertainty in the prior. It seems to me that you will have a sample size much bigger than this.

Lest us know how you get on.