• By Dartington SRU
  • Posted on Thursday 15th March, 2012

To randomize or not to randomize?

Randomized experiments are the preferred method for assessing the effects of treatment for theoretical and practical reasons. But they are not always feasible or ethical to do, in which case it is likely that non-randomized experiments will be used. But to what extent do results of non-randomized designs match those of randomized ones?William Shadish started to study this question during the 1990s. His early analyses used a meta-analytic approach. He gathered a large number of randomized and non-randomized experiments which had looked at the same question, worked out the average effect size, and compared them for similarity.Previous work in this vein had led researchers to assume that the two methods essentially produce the same results. The studies by Shadish and colleagues, however, showed that “ignoring assignment method is a very bad idea”.In studies of interventions such as drug use prevention, family therapy, Alcoholics Anonymous and coaching for school aptitude tests, the researchers found that the two methods yielded similar results sometimes but not always. Further, when the results were different they were not consistently different in the same direction: effect sizes were sometimes higher for randomized experiments and sometimes lower.A problem with these findings is that in randomized studies it may not only be the random assignment that makes them different from non-random assignment experiments. For example, randomized experiments tend to use “passive” control groups in which participants receive little or no attention, whereas non-randomized studies are more likely to use “active” controls in which participants receive attention. Non-randomized studies are also disproportionately likely to use matching of participants in treatment and control.Non-randomized experiments that allow participants to choose whether they receive the intervention in question yield results that differ from randomized experiments far more than results from non-randomized experiments that do not allow such self-selection. Often the bias from self-selection can be seen at the start of the experiment and it simply carries over to the post-test effect sizes. A complication, again, is that this does not always work in the same direction. For instance, participants who self-select psychotherapy tend to be more distressed, whereas those who self-select AA are more likely to stay sober.On a positive note, when randomized and non-randomized experiments are conducted identically to each other in all respects except for assignment mechanism, they can yield quite similar results. This said, ultimately meta-analysis cannot provide a firm answer to the question Shadish was studying because it cannot ensure that the experiments were conducted identically except for assignment method.In an effort to address this problem, Shadish describes three studies concerning mathematics and language training in which university students were randomly assigned to be in either a randomized study or a non-randomized alternative. Participants were otherwise treated identically. This research yielded several important lessons.The first is that in non-randomized studies the good measurement of selection is crucial. In other words, evaluators should find out what factors predict whether people will choose one condition over the other. Controlling for the relevant factors can eliminate bias.Another lesson is that the statistical method used to adjust results is unimportant. When different methods are used to adjust results from the non-randomized experiment, bias reduction was about the same for all of them, as long as – critically – the adjustment used good measures of selection.Next, non-randomized experiments produce more accurate estimates of effect when the control group comes from the same location as the treatment group, and when it shares many of the same characteristics as the treatment group. In short, “use focal local controls”. Large sample sizes are also needed in non-randomized designs. Controlling for factors that predict whether people will choose one condition over another works best with large samples. In a computer simulation, for instance, the most accurate results came from studies with at least 1,500 participants, with 500 needed to ensure only a small risk of deviating from the right answer.Shadish also stresses that that the analysis of studies that allocate people to experimental conditions based on whether they fall above or below a cut-off on a given variable is difficult. In particular, care is needed to model correctly the relationship between the assignment variable and the outcome. It should also be noted that randomized experiments and studies using the cut-off method – regression discontinuity design - typically estimate two different parameters, so they may generate different results.Readers may conclude that the complications inherent in non-randomized experiments – as identified here – render them less desirable than many often seem to think, but viewed positively Shadish’s analyses give substantial cause for optimism.He writes: “Conditions do exist under which nonrandomized experiments can yield accurate answers. This is most obvious for the regression discontinuity design, where a number of studies have supported its accuracy when it is properly analyzed.”***********ReferenceShadish, W. R. (2011) Randomized controlled studies and alternative designs in outcome studies: challenges and opportunities. Research on Social Work Practice 21 (6), 636-643.

Back to Archives