This guest post was written by Christy Allison and Maria Brindlmayer of JBS International, Inc. JBS is an EnCompass partner on the USAID Evaluation Services IDIQ contract.
In a recent conversation we heard someone ask: “If impact evaluations are so powerful, why don’t we just use them more often?” Impact evaluations are sometimes characterized as the most important evaluations, and among them, the randomized control trials (RCTs) as the “gold standard”. What makes them so coveted is their ability to assign causality, i.e., being able to measure the change in a development outcome and attribute it to a defined intervention. Donors and implementers want to invest in interventions that show strong positive results. Thus, understanding what has worked and why is very important. However, impact evaluations, and RCTs in particular, are not always the most appropriate evaluation solution.
If measuring change and defining the factors that have caused the change are the main purposes, an impact evaluation is likely to be the best approach. In its ADS 200, USAID defines impact evaluations as evaluations that “are based on models of cause and effect and require a credible and rigorously defined counterfactual to control for factors other than the intervention that might account for the observed change.” Impact evaluations can be designed in several different ways. In an experimental research design, participants are randomly assigned to either receive an intervention (treatment group) or not (control group) in order to determine the effect of a program. A sample of participants (of at least a minimum size) is randomly assigned to ensure validity of the impact findings. While this method introduces the least bias, it can pose an ethical dilemma of withholding a service or benefit from those in need. In addition, selecting the most rigorous design makes this type of evaluation more expensive than others.
In a quasi-experimental design, the selection process for participants in the treatment and comparison groups is not randomized, which reduces the confidence with which the impact of a program can be measured accurately, but it may be considerably less costly and more feasible politically.
Impact evaluations are most frequently also summative evaluations, which are a systematic process of collecting data on the impacts, outputs, products, or outcomes hypothesized in a study. The term is also used for evaluations that happen at the end of a project, also known as “final evaluations” or “end of project evaluations.” The cost and effort involved is generally best spent when the intervention is mature enough to have achieved its full impact. Different tools and resources (e.g., USAID’s impact evaluation decision tree) can help determine when an impact evaluation is the appropriate method.
If the main purposes of an evaluation are to find out if a program has achieved the goals that were set out in the program’s plan and answer questions that are pertinent to project design, management, and operational decision making, then a performance evaluation is a more appropriate method. USAID defines a performance evaluation as an evaluation that focuses on “what a particular project or program has achieved (either at an intermediate point in execution or at the conclusion of an implementation period); how was implemented; how it was perceived and valued; whether expected results occurred; and other questions that are pertinent to project design, management and operational decision making.” A variety of qualitative and quantitative designs can be used for performance evaluations. Performance evaluations often address a broad and diverse list of questions which can be addressed by different methodological approaches. Achieving the goals of an evaluation as cost-effectively as possible is generally a desirable outcome, and a less rigorous but less expensive evaluation may be a smart investment.
Performance evaluations may be summative, conducted at the end of a project, but they may also be formative evaluations, the aim of which is to improve, shape or “form” a project’s design and/or performance mid-stream – generally the rationale for conducting mid-term evaluations.
While USAID’s ADS 203 does not specifically mention formative or summative evaluations, these terms are used in informal communications as well as by other donors and implementers.
Each of these evaluation types requires different resources, types of data and data collection methods, and budgets. The most appropriate evaluation method needs to be selected based on the purpose, technical feasibility and cost. Thus, it is important to ask the following key questions when designing an evaluation:
- What is the purpose of the evaluation?
- What are the key evaluation questions that need to be answered?
- What is the potential of the evaluation results for learning?
- How will the results be used?
- Is the preferred evaluation design feasible?
- Is there sufficient budget for the preferred evaluation design?
Thinking about these questions at the beginning of an evaluation, project, or program is critical in order to select the appropriate method and enable rigorous and reliable data collection and analysis.