• By Dartington SRU
  • Posted on Tuesday 08th November, 2011

A matter of replication

“Prevention science will be better positioned to help improve public health if more replications are conducted.” That is the strong message from a team led by Jeffrey Valentine from the University of Louisville, US.“Interventions are more likely to be accurately labelled as effective if they have been thoroughly tested, especially if those investigations have occurred across diverse populations and settings”, it says.In particular, replication studies help with efforts to demonstrate the efficacy and then effectiveness of interventions, and inform efforts to disseminate such interventions widely, for example through the Blueprints database.However, replication raises some important conceptual questions, foremost of which is: “Can one program be considered the replicate of another?”One response is to say “yes” if the intervention is effectively the same – that is, if the critical program elements are largely unchanged. An alternative is to reply in the affirmative only if the effects of the program on dependent variables are the same, or if they affect mediating variables in the same way.The question then is whether the intervention’s effect been replicated. A judgment of this can be made with as few as two studies, but this will be highly tentative, so its value for decision-making is limited. Given this, the Valentine team argues that there is a need for a “dramatic increase in the number of replications conducted and published […] Policymakers, scholars, and administrators should avoid premature adoption or rejection of interventions that appear to be effective or ineffective within a weak inferential framework.”The team also contends that public health will benefit if replications are systematic, thoughtful and conducted with full knowledge of the trials that have preceded them.Yetintentional replications are relatively rare. They include testing with a different population to see if the results work in another context, testing for the effect of varying aspects of implementation, or testing to see what factors mediate the effect of the intervention on outcomes. In reality, most replications are ad hoc – as Valentine and his colleagues say, they “vary from one another in multiple known and unknown ways”.In order to add up the results of replications, state-of-the-art techniques are needed to summarize the body of evidence on the effects of the interventions. Two assumptions underpin such efforts: all studies conducted on the intervention are available, and the quality of study design and execution are comparable.Then, rather than ask whether results from study 1 are the same as those from study 2, the Valentine team suggests that it is better to ask: “What does the available evidence say about the size of the effect attributable to program A?” This focuses attention on methods for synthesizing results from two or more studies of the same program.They present several approaches to doing this. One is called “vote counting”, in other words adding up the number of studies that find a statistically significant effect on the same outcome. This is commonly used to generate lists of effective programs.The problem with it is that there is a reasonable probability that studies will not reach the same statistical conclusion. This is because studies that are underpowered are in danger of failing to detect real intervention effects.Another approach involves comparing the direction of effects without reference to effect size or statistical significance, but this very blunt. Comparing effect sizes is an alternative.But these approaches are essentially about whether the results from two studies “agree” in some sense. A different approach is to ask what the available evidence says about the size of the effect.This is where meta-analysis comes in: effects from larger studies are assigned proportionally more weight, allowing the calculation of a weighted average effect size. According to the research team, “random effects meta-analysis may be most appropriate strategy for dealing with ad hoc replications”.Replication research is clearly needed but it doesn’t happen enough because “current policies and practices do not seem to favor the conduct of replication research”. Four incentives to help redress this situation are suggested.First, there should be more collaborative funding to increase the continuing evaluation of programs. For example, research agencies might fund the efficacy trial of a program, with implementation agencies funding the subsequent effectiveness trial.Second, programs for which results are replicated should receive greater priority for implementation. This is manifested, for example, in the Blueprints model.Third, it is necessary to address bias against publishing replication research. One idea advanced in the paper is for a journal that only publishes replications.Last, academics need institutional incentives in the form of promotion and tenure as rewards for conducting replication research. It should not be seen as a second-class pursuit.Valentine, J. C., Biglan, A., Boruch, R. F. et al (2011) ‘Replication in prevention science’, Prevention Science 12: 103-117.

Back to Archives