Edwin:
As you know, the design effect is a measure of the effect of cluster sampling on the final precision of any estimates made from a sample selected using cluster sampling. The design effect is affected by 2 factors:
1) The average size of the clusters in the survey sample (M) and
2) The inherent heterogeneity of distribution of the outcome of interest, as measured by the intracluster correlation coefficient (ICC, sometimes called the rho).
The formula showing this relationship is:
Design effect = 1 + [(M - 1) x ICC]
So you can see as either M or ICC increases, the design effect increases, indicating that precision decreases and confidence intervals widen. Not good. So we want the smallest average cluster size to maximize precision. That's why we say, given a total sample size, for example 500 housholds, we want a larger number of smaller clusters. 50 clusters of 10 households each is much better than 10 clusters of 50 households each.
You are interested in the case where more than 1 cluster is selected from the same primary sampling unit (in your example, the village of Shawar). And you want to know whether using a single cluster ID number for all the units selected in the same primary sampling unit (that is, all the households selected in Shawar) is better or worse than keeping these clusters separate by using different cluster ID numbers.
The effect of combining all the clusters into one is clear: this would increase the average cluster size, thus increasing the design effect and decreasing the precision. But there may be other considerations to this question. Regardless, in large primary sampling units, I usually divide the area into segments, then randomly select the number of segments I need depending on the number of clusters in that primary sampling unit, then select the required number of households from each selected segment. That way, the cluster are at least geographically distinct from different areas of Shawar village. This may more accurately model what the computer assumes to be separate and distinct clusters.
Of course, the best way is to be sure that the primary sampling units are small enough so that none gets more than one cluster in the first stage of sampling.