Mainstreaming Evaluation or Building Evaluation Capability? Three Key Elements by Dr Paul Duignan

One of the things about being an evaluator sitting on this side of the world, in New Zealand, is that even if you follow the literature you get something of a sense of isolation from the debates happening within the evaluation discipline in the United States. At the same time, somewhat paradoxically, you draw extensively on much of that literature in order to inform your methodologies and approaches.
Evaluation as a discipline here is relatively tiny and has been shaped by both the size of the country (3.6M, about the size of a number of cities in the U.S!) and the particular demands of our system of government administration. Our size creates various constraints on our work as evaluators. Our evaluation budgets are generally very small by U.S. standards. Often programs and policies are introduced nationally, so there is no sense in which comparison communities (as in different States) can be used to evaluate outcomes. Even in those instances where programs are introduced locally and evaluation designs set up, they often attract publicity, or word of mouth networking, which can contaminate comparison groups.
On the positive side, evaluators, as with everyone else in a small country, are forced to be multiskilled generalists; we have to work closely with other professions; it is relatively easy to network with a large proportion of the stakeholders from any one sector we work in; and access to senior policy makers is not particularly difficult. In addition, New Zealand has a working (there is a variety of opinion on how well it is working) treaty between the European population and the indigenous Maori, the  Treaty of Waitangi. This has forced us to start to come to terms with the power relations in evaluation and to try to develop appropriate ways of evaluating Maori programs. While there are still a lot of politics and practicalities to sort out on this issue, there have been some positive developments. In particular, the recognition that there needs to be more real autonomy for Maori to evaluate programs in ways that are appropriate for them.  In addition, many lessons have been learnt about evaluating community-based programs in the process. This paper does not attempt to summarise the extensive work that has been done by Whariki in undertaking evaluations of Maori programs and developing evaluation methodologies which work with such programs (Moewaka Barnes 2000; Moewaka Barnes In Press) .
All this leaves one with some uncertainty as to what may or may not be relevant for North American evaluators from our somewhat unique perspective over here. The author has been working in evaluation with Professor Sally Casswell, and other colleagues at the Alcohol & Public Health Research Unit at the University of Auckland over the last decade and a half. During this time we have had the luxury of working with one sector, the public and community health sector, to build evaluation capability in the sector as a whole. We have done this through developing appropriate models for evaluation, producing resources, running training programs at a range of different sector levels, and engaging with policy makers in discussions about approaches to evaluation. The author was involved in much of the early work on this and has continuing involvement but also much of the teaching, research and innovative work has been undertaken by the colleagues listed in the acknowledgements to this paper.
It was therefore with great interest that I saw the Presidential Strand of the American Evaluation Association’s 2001 conference was on ‘mainstreaming evaluation’. In a sense this is exactly what we have been trying to do for the public and community health sector over the decade and a half we have been working with it. On occasion, the author has looked at the small size of the evaluation profession in New Zealand as a disadvantage to promoting evaluation in New Zealand. However, reflecting on the Presidential Strand theme, perhaps this has forced us to attempt to ‘mainstream evaluation’ all along. We had no other strategy if we were going to get any significant evaluation done in New Zealand. While, as evaluators, we worked on a significant number of projects ourselves, and there were other evaluators working in the field, the bulk of evaluation needed to be done by the sector itself if it was to be done at all.
A perspective from afar
Looking at the background paper on the Presidential Strand it says that the “role of evaluation in organizations, including community agencies, government agencies, schools, and businesses has been marginalized in most cases”. It is always interesting to look back on the roots of a phenomenon. How have we come to this point where evaluation is seen as being marginalized, at least in the U.S? What has the situation looked like from a distance?  Well, going right back to the mid 1970’s when the author was just starting to study evaluation and the evaluation heroes of the 1960’s and 70’s were in vogue, it all seemed so straight forward. In those halcyon days evaluation knew where it was going. We were building the “experimental society” of Donald Campbell (Campbell 1975) . Valiantly we were applying what had been learnt in the other sciences to large-scale evaluations of social programs. We just had to get the evaluations done and feed them through the appropriate political channels and then their results could be implemented. We were well on the way to moving towards a modernist utopia – somewhere where increasingly rational, empirically based policy making meant we had maximized our chances of achieving our social policy goals.
Then along came Carol Weiss (Weiss 1977) and pointed out that policy makers are driven by many different pressures in addition to our hard won empirical evaluation results. She scaled our expectations back to the concept of enlightenment – the gradual process of results seeping through into decision making through various channels. But lets face it, who wants enlightenment when we thought we were going to get direct traction on decision making through us delivering policy makers the facts? So evaluation got its first lesson in marginalization – we were only one voice amongst many in the decision making process. Our initial expectation that we would take center stage in the decision making process – be mainstreamed as it were – was not met and we had to develop a more sophisticated view of how we as evaluators could influence policy making.
In any event, we still had our empirical methods and we could continue with collecting objective data and in time it might flow through to policy makers and influence their decisions. Then we got hit by post-modernism, as it started its systematic deconstruction of traditional quantitative objective social science and, by implication, evaluation. Philosophy of science issues had always loomed large in evaluation as many of us stake our reputations on our knowledge claims being somehow more ‘objective’ than the multitude of subjective stakeholder claims being made for and against programs and policies. Guba and Lincoln led the charge with their view that knowledge was socially ‘constructed’ in contrast to the view that objective truth could be ‘discovered’ (Guba and Lincoln 1989) . This was a problem in that many of us had staked our profession on getting ‘the facts’ to managers and policy makers. Here we had evaluators questioning whether we were ever going to be able to get objective hard data which was valid and reliable and, most importantly, uncontestable. 
In addition, indigenous people the world over, African Americans in the U.S. and other groups who have been truly ‘marginalized’ on the fringes of economic, social and political life, started to emerge from having being silenced by the previous centuries of colonialism and exploitation in its various guises. They highlighted the mono-cultural origins of the methods, concepts and approaches being used in research and evaluation and pointed to the role of evaluation and research in the politics of oppression (Smith 2000) . In addition to this being a general challenge to evaluation, it was particularly telling in that many of the participants in the social programs that evaluators were working on were indigenous people or members of other marginalized groups.   
The problemization of evaluation being an objective fact-finding exercise in the Western scientific tradition was accompanied by the realization that there were multiple stakeholders in any evaluation and that it was a value judgment as to which stakeholders’ world views should be reflected in an evaluation. Some pushed this right through to the point of arguing that evaluation should be all about a process of ‘empowerment’ for those involved in programs (Fetterman, Kaftarian et al. 1996) .  Indigenous people started to develop methods for undertaking evaluations that were consistent with their cultural traditions (Moewaka Barnes 2000; Moewaka Barnes In Press) . All of this turmoil and debate resulted in a massive increase in the number of approaches and techniques available to evaluators. All of the qualitative and interpretative techniques from social science became available as legitimate evaluation methods standing alongside the quantitative techniques that had been honed by evaluation in the 1960’s and 1970’s.
While the repertoire of techniques expanded, the debate about approaches continued. Some valiantly maintained that there was the possibility of a workable objectivity in evaluations (Scriven 1997) . Others adopted the middle ground arguing in a pragmatic fashion for ‘utilization focused’ evaluation which took as its starting point the various audiences for evaluation and had the evaluator simply work out which audiences they were working for and then design their evaluations to meet their needs (Patton 1986) .  With this emphasis on diverse stakeholder audiences, formative evaluation (improving program implementation) and process evaluation (describing program processes) took their place alongside outcome evaluation in the evaluator’s repertoire. By the end of last century evaluation had a range of sophisticated quantitative and qualitative tools for dealing with evaluating programs which recognized the complexity of social programs, the epistemological and design issues which need to be faced, and in at least some contexts it had started to address the complex issue of evaluation approaches appropriate for indigenous peoples and other cultural groups. We were ready to take on the world, they should have been calling out for us to be mainstreamed.
Then what happened then in the 1990’s? Particularly in the U.S., large sections of the policy community completely ignored the sophistication of our profession and headed off independently into the Performance Management Movement (Blalock 1999) . This is the often naïve attempt to measure a range of indicators and to use them (with little regard to attributing causality) for holding social programs to account - a sort of ersatz outcomes evaluation with none of the sophistication we had spent decades of hard work developing. Evaluators were left in the role of sideline critics complaining about the efforts being undertaken in performance management as too simplistic (Greene 1999) . This was while the policy makers and performance measurement people got on with the party, usually in blissfull ignorance of the difficulty of what they were claiming they were going to be able to achieve. Marginalization for evaluation again, ironically, this time not through having a too simplistic view of the world, but through having a too sophisticated view of what could and could not be measured in regard to programs and policies. So given this history, what should evaluation as a discipline now do in the face of marginalization this time around? The obvious answer, given by the AEA 2001 Presidential Strand theme is to try once again to ‘mainstream evaluation’.

What exactly is meant by mainstreaming evaluation?

  It is important to start by thinking about what it is that we are tying to achieve in mainstreaming evaluation. We can start thinking about this by using the traditional formative evaluator’s trick of “looking behind” the strategy we are proposing - mainstreaming evaluation - in this case. We are looking for the goal that this strategy is attempting to achieve. Presumably, the purpose of mainstreaming evaluation is to get our organizations, policies and programs to be more effective and efficient.  We can work back from our goal using a program logic approach - what is it that is needed to achieve this goal? Presumably it is that people throughout our organizations and policy making processes are being more evaluative about what they are doing. Note this is not saying that they should all be calling what they are doing ‘evaluation’. What is needed to ensure that people become more evaluative? They will have to have appropriate evaluation skills, systems, structures and resources to support them in taking a more evaluative approach to their work.  
Looked at it this way, the task of mainstreaming evaluation may be better put as one of building evaluative or evaluation capability throughout our programs, organizations and policy development. Such evaluative activity may not necessarily be labeled evaluation, but is should, regardless of what it is called, contribute toward the goal of people being more evaluative about what they do and finally make our organizations, policies and programs more effective and efficient.
At first sight, this idea of building evaluation capability sounds synonymous with mainstreaming evaluation. Whether or not it is, however, depends on exactly what evaluators mean when they use the term mainstreaming evaluation. For the author the term mainstreaming evaluation has the potential implication that we are tying once again for evaluation to take center stage.  As evaluators we need to think through the extent to which our desire to mainstream evaluation is an attempt to grow the profession in contrast to getting people to be more evaluative. Looking at it as the attempt to get people to be more evaluative, we may need to be prepared to  ‘give evaluation away’ in order to build evaluation capability. Giving evaluation away means sharing skills and approaches without these necessarily being labeled as ‘evaluation’ by those who use them. This is in contrast to looking for opportunities for growing the size and power of our profession. 
In a sense, the growth of evaluation as a profession may be part of the problem in its marginalization. Professionalization of evaluation can reify evaluation and make it a ‘separate’ activity which may or may not be ‘done’ to programs. People ask the question: are we going to do an evaluation of this program? with the implication that they have to decide as to whether they will call evaluators in to do a separate piece of work. Ideally, people should see evaluation as a central task they own themselves and they may or may not have to involve outside evaluators in what should be seen as a core task for the business as a whole. 
All of this is not to say that there is not plenty of room for the evaluation profession to continue and, in fact, thrive. There are enormous technical challenges in designing some types of evaluation and specialist evaluators will always have to be involved in these. All that is being argued here is that the most useful strategy in attempting to mainstream evaluation is probably to try and ‘give it way’ rather than expect that evaluation as an entity itself can be mainstreamed.
So how can we go about giving evaluation way or building evaluation capability in our organizations? The author has been working on this question over the last decade and a half in New Zealand. In the 1980’s working with Professor Sally Casswell at the University of Auckland in the public and community health field we became interested in the best way in which to build evaluation capability for the sector as a whole. We were involved in running a large number of training workshops for people from various levels within the sector and being involved in undertaking and consulting on the methodology for many evaluations. The author was involved in a number of these and many others were undertaken by colleagues at the Alcohol & Public Health Research Unit and Whariki, the Maori research unit working in partnership with APHRU. From his experience working with colleagues at the University of Auckland and his experience working as a consultant evaluator in a number of other sectors over this time, the author believes that there are three key elements required to build evaluation capability. Each of these is discussed in below.

Three Key Aspects of Building Evaluation Capability

 The three key aspects of building evaluation capacity are: 

Using an appropriate evaluation model

Developing evaluation skills appropriate for each level of an organization or sector

Organizational or sector level strategizing to identify priority evaluation questions,  rather than just relying on evaluation planning at the individual program level.  

Each of these is discussed here. They are put forward as suggestions rather than definitive answers and the author would appreciate the opportunity to discuss all aspects of these and other approaches to mainstreaming evaluation at the AEA 2001 Conference.

1. Using an appropriate evaluation model.

Discussing an appropriate evaluation model may seem a slightly obscure and theoretical place to start in thinking about building evaluation capability. It is important, however, to think about the most useful way of describing evaluation for the particular purposes of building evaluation capability. There are a number of different ways in which evaluation can be described and a number of different typologies that are used. Terms used for aspects of evaluation include:  quasi-experimental design, formative, developmental, implementation, process, impact, outcome, summative, stakeholder, empowerment, goal-free, utilization focused, fourth generation and naturalistic (Cook and Campbell 1979; McClintock 1986; Patton 1986; Guba and Lincoln 1989; Rossi and Freeman 1989; Scriven 1991; Fetterman, Kaftarian et al. 1996; Chelimsky and Shadish 1997) . As in any discipline, these terms are at various conceptual levels and are used in various ways by various evaluators for various purposes. What type of evaluation model or typology is then the most appropriate for building evaluation capability?  
It would be useful for a set of criteria to be developed to assist in determining which evaluation models are the most appropriate for building evaluation capability. A provisional list of criteria has been developed by the author as follows. Appropriate evaluation models for capability building should: 
Attempt to demystify evaluation by positioning evaluation as any activity directed at    answering a set of easily understood questions
Use a set of evaluation terms which emphasize that evaluation can take place right
across a program’s lifecycle and is not limited to just outcome evaluation
Allow a role for both internal and external evaluators
Have methods for hard to evaluate, real world programs, not just for ideal-type
large scale expensive external evaluation designs
Not privilege any one meta-approach to evaluation (e.g. goal-free, empowerment)
Some evaluation models meet these criteria better than others. Each of the criteria is discussed below:
Evaluation positioned as answering a set of easily understood questions
The first essential aspect of an appropriate evaluation model for evaluation capability building is that it is simple to understand. It needs to be able to be explained in simple terms to a wide range of different stakeholders with diverse training, backgrounds and experience. Evaluation discussions can very rapidly become highly technical. Michael Foucault, the doyen of postmodern thought, has illustrated how technical language is used by groups of professionals to exercise power over others (Foucault 1973) . Evaluation is no exception to this. We all know that when an evaluator turns up talking about quasi-experimental design, the regression discontinuity approach or, more recently, discourse analysis, there are not a lot of people in the room who are going to feel they are on equal terms in the discussion. When working with people employed in community organizations, as are many of the people involved in the public and community health sector, it is particularly important to find a model which relates evaluation to their day to day work experience.
If we are to build evaluation capability into our programs, organizations and policy development we need a simple and easily comprehensible starting point for an evaluation model. One such starting point is to say that evaluation is simply about asking questions. These questions are not something that evaluators alone should attempt to answer themselves; they are questions that should be an important concern of every policy maker, manager, staff member and program participant. The questions the author uses to position the evaluation model he uses, are firstly the overall evaluation question for any organization, policy or program: 

Is this (organizational activity, policy or program) being done in the best possible way?

There are then three major subsidiary questions that can be unpacked from this: 

How can we improve this organization, program or policy?
Can we describe what is happening in this organization, program or policy?
Has this organization, program or policy achieved its objectives?
Answering these questions is not something that is solely the responsibility of evaluators. They are questions which everyone in any organization should be asking themselves all the time. This question-based introduction to evaluation helps to demystify the process of evaluation. It puts the responsibility for evaluation back where it belongs, on the policy makers, funders, managers, staff and program participants rather than leaving it to evaluators. It points out that managers and staff cannot avoid these questions; they just have to work out ways of answering them through their own efforts and identifying when it is appropriate to call in specialized evaluation help. In an ideal evaluation design, funders, program planners and stakeholders have the opportunity to work together on defining the questions which an evaluation should be asking. This approach to positioning evaluation also sets the scene for promoting the idea of organizational or sector level strategizing of priority evaluation questions which is discussed below.
Describing evaluation in this way in training workshops is usually greeted with participant feedback that it has demystified evaluation for them, made them realize that they are already doing considerable evaluation themselves, and that there are other techniques they could be using to have a more evaluative approach to their day to day work.    
Evaluation typology with terms right across program lifecycle 
In New Zealand at least, most stakeholders unfamiliar with evaluation still see it as mainly just about outcome evaluation. An appropriate evaluation model should use terms that emphasize the fact that evaluation can take place right across a program’s lifecycle. One dichotomy used by practicing evaluators in describing evaluation types to stakeholders and which assists with this is the formative/summative distinction. This highlights the importance of formative evaluation. While this distinction continues to be useful in some contexts, it does not go far enough in emphasizing that evaluation is something that is spread right across a program life-cycle, the formative/summative split can be interpreted by some to just imply activity at the beginning and at the end of a program. Another dichotomy that is used by practicing evaluators in discussions with stakeholders is the process/outcome distinction. This again has its uses. However just relying on it in evaluation practice sometimes leaves the impression that people believe process evaluation is a full alternative to outcome evaluation.
The Alcohol & Public Health Research Unit in its work in evaluation capability building incorporates elements from both of these dichotomies and uses a three way split for evaluation between formative, process and outcome evaluation (Casswell and Duignan 1989; Duignan 1990; Duignan and Casswell 1990; Duignan, Casswell et al. 1992; Duignan, Dehar et al. 1992; Turner, Dehar et al. 1992; Duignan 1997; Waa, Holibar et al. 1998; Casswell 1999)
Formative evaluation being defined as: evaluation activity directed at optimizing a program. (It can alternatively be described as design, developmental or implementation evaluation.)
Process evaluation being defined as: describing and documenting what happens in the context and course of a program to assist in understanding a program, interpreting program outcomes and/or to allow others to replicate the program in the future. Note this narrows the definition of process evaluation by separating out the formative evaluation element. 
Outcome evaluation being defined as: assessing the positive and negative outcomes of a program. This includes all sorts of impact/outcome measurement recognizing that outcomes can be short, intermediate or long term and also arranged in structured hierarchies, e.g. individual level, community level, policy level.
None of these terms are opposed to each other, they are seen as the three essential aspects of evaluation. The three terms can be directly related to the three subsidiary evaluation questions identified in the section above. Formative evaluation asks the question: how can we improve this organization, program or policy? Process evaluation asks the question: can we describe what is happening in this organization, program or policy? Outcome evaluation asks the question: has this organization, program or policy achieved its objectives?
The three types of evaluation in this model can easily be related to different stages in program development: the start, middle and end of a program. This encourages thinking about how evaluation can be used right across a program’s lifecycle, each type of evaluation – formative, process and outcome – must be individually considered as a possibility for evaluation activity. If outcome evaluation proves too expensive or difficult there still may be useful questions that can be answered in regard to formative and process evaluation. Where outcome evaluation is difficult, because of not having a model that emphasizes the options of formative and process evaluation, it can drive people into pseudo-outcome evaluation such as that which often typifies the Performance Management Movement.
For evaluators, this discussion will hardly seem like rocket science. It has just outlined one of the many ways of splitting up the evaluation process. The reason this particular model is described here is that the characteristics of the evaluation typologies and models we intend to use for building evaluation capability need to be scrutinized. There may be better, or alternative models, to the one described here. The author is simply interested in generating discussion about the most useful models and typologies for evaluation for the particular purposes of building evaluation capability.  The approach outlined here does deals with some of the pitfalls which arise from other evaluation models and typologies when used for capability building with non-evaluator audiences.
Internal and external evaluators
An appropriate evaluation model for building evaluation capability must allow for the possibility of both internal and external evaluators. If evaluation is just seen as something that is undertaken by external experts then there is little reason for internal staff to develop their evaluation skills.  Of course, depending on the size of an organization, internal evaluators may have considerable distance from an actual program being evaluated. A good way of looking at this issue is not to think so much in terms of external or internal evaluators, but rather to see evaluators as potentially being on a continuum running from close involvement through to little or no involvement in a program.
An appropriate evaluation model for evaluation capability building needs to be able to describe the pros and cons of closely involved evaluators through to less involved evaluators. It also needs to have developed ways of managing the risks around evaluators’ level of involvement based on: the purpose of the evaluation; the type of evaluation work being done; whether the evaluators are working in teams which include roles with various levels of involvement with a program; and the extent to which the data sets being collected and analysed are highly responsive to bias. If an evaluation model does not deal with these issues then it is of little use in building evaluation capability as it maintains the fiction that evaluation can only be undertaken by outside experts.

Methods for hard to evaluate real world programs

An appropriate evaluation model for capability building also needs to incorporate methods that can be used to evaluate a wide range of real world programs where there may be very limited resources available for an evaluation. This means that it must include practical evaluation methods that can be undertaken by program staff wanting to evaluate their programs. These are often formative and process evaluation techniques. 
One area where new evaluation models are crucially important is in the area of community programs. Building on the work of writers (Freire 1968) and (Alinsky 1971) the use of community-based strategies has swept through public and community health (Labonte 1989) . Indigenous people have also been demanding programs that respect the autonomy of their communities and use methods which are consistent with the way in which their communities operate. Community approaches are therefore now being adopted in a wide range of social program areas in addition to public and community health.
Evaluating community-based programs presents interesting sets of challenges for evaluators and raises enormous difficulties for traditional models of evaluation. Community programs have long time frames, they take place in communities where many other programs are running at the same time, often with the same goals. Even more challenging, they are usually based around a philosophy of community autonomy. This presents significant challenges to evaluation looking at whether a program has met its objectives. If a set of objectives is proscribed by a funder for a program, it is likely that the communities involved in undertaking such programs will want to set their own objectives. Which set of objectives should be evaluated against, those set by the funder or those set by the community program itself?
The author and his colleagues have had considerable experience in dealing with evaluating these sorts of programs (Duignan and Casswell 1989; Duignan and Casswell 1992; Duignan, Casswell et al. 1993; Casswell 1999; Casswell 2000; Moewaka Barnes 2000; Moewaka Barnes In Press) . There are models that can be used but these require considerable innovation on the part of evaluators. If we are to build evaluation capability we need to expand and refine these models so that they become better at dealing with the realities of real world programs rather than just idea-type social experiments.

Not privilege any one meta-approach to evaluation

Meta-approaches to evaluation are evaluation styles that endorse a particular solution to the philosophy of science questions that lie behind evaluation. Philosophy of science questions are always close to the surface in discussing evaluation because it is about making judgments about organizations, policies and programs. Goal-free evaluation and empowerment evaluation are good examples of meta-approaches to evaluation that take different philosophy of science positions (Scriven and Kramer 1994) . It is fine for evaluators to adopt one or other of these meta-positions in their professional work as evaluators. It is also fine for them to argue that their approach should be the basis for evaluation efforts in particular settings and situations. However, in building evaluation capability it is important that a more catholic approach is taken to evaluation that does not eliminate one form of evaluation that some stakeholders may find useful. Of course, the Western evaluation approach itself can be seen as just one meta-approach to evaluation and we need to be aware that this itself is not universally accepted by stakeholders. In New Zealand at least, Maori are actively involved in the process of developing evaluation models and approaches which may or may not have similar assumptions, methods, and techniques to evaluation as it is practiced in the Western tradition (Moewaka Barnes 2000; Moewaka Barnes In Press) . Hopefully the fertile debate between different meta-approaches to evaluation will continue to feed thinking and practice in evaluation as it has done in the past.  It is important that in attempting to build evaluation capability we encourage different approaches to evaluation.
This section has set out and discussed the criteria for evaluation models that are appropriate for evaluation capability building. Some suggestions have been put forward regarding what an appropriate model may look like. The main challenge in building evaluation capability is to think through how our models (and there will need to be multiple models for different stakeholders) need to be different when we are using them for capability building to when we are using them amongst ourselves as evaluators.
The second key element in building evaluation capability – training and skills development – is considered next.

Appropriate evaluation skills training at all levels

The second step in building evaluation capability is to develop skills, systems and structures for evaluation activity at all levels within organizations and sectors. From the author’s experience in the New Zealand context, and this may well apply in the U.S. and other countries, there is a lack of both an adequate conceptualization of, and skills in, evaluation right across all organizational levels. Policy makers, funders, service provider management, staff and program participants generally tend to have a relatively limited understanding of evaluation. If they do have more than this it is often based on the erroneous view that evaluation is just about outcome evaluation.
The objective of skills development in evaluation for an organization or sector is to both further sophistication about evaluation along the lines of the evaluation model discussed above and to teach appropriate specific evaluation skills to those who can use them in their day to day work. This can be done by developing manuals and training resources and by running training workshops.
At the Alcohol & Public Health Research Unit and Whariki a series of manuals on evaluation have been developed that reflect the evaluation model described above and have been widely distributed throughout the public and community health sector in New Zealand (Casswell and Duignan 1989; Duignan, Dehar et al. 1992; Turner, Dehar et al. 1992; Waa, Holibar et al. 1998) . These manuals deliberately copied the visual style of an early key sector document on health promotion (New Zealand Board of Health Committee on Health Promotion 1988) in order to have them seen as having continuity with sector documents rather than being “evaluation” documents external to the sector. As time has gone on the response to these manuals has been evaluated and subsequent manuals have been amended on the basis of this feedback.
During the period of time that the resources have been available, the Unit and Whariki have carried out a series of training workshops for different audiences within the sector. The different types of training are:
Brief presentations on evaluation at a range of sector workshops on other issues. Typically these are for one to two hours covering the general evaluation model and principles and raising awareness of evaluation within the sector.
Two day Level I courses for service provider lower level managers and staff where they can discuss the overall evaluation model and learn specific evaluation skills which they can use in their day to day work. Considerable time is spent demystifying evaluation and describing simple formative and process evaluation methods that can be used by service provider staff. Outcome evaluation methods that can be used are discussed and indicators as to when other evaluation expertise needs to be drawn in.
More advanced Level II two day courses for service provider managers and staff wanting to develop their skills. These provide more indepth training in evaluation skills.
One week long workshops for policy makers, funders, larger provider specialists, and researchers to develop and practice appropriate evaluation skills. These cover the evaluation model and the skills and techniques discussed in the Level I and II workshops in more depth, with further discussion of outcome evaluation issues.
Workshops specifically run by Maori evaluators for Maori program mangement and staff. These look at evaluation concepts and methods from a Maori perspective.

One day overview workshops to discuss evaluation concepts and approaches for service provider management. These discuss the concepts from the evaluation model and how these relate to organizational policies and practices. For example, distinguishing between performance management and evaluation; what is and what is not realistic to expect in terms of outcome evaluation; and setting organizational priorities for evaluation. 

One day workshops for staff and management within an organization discussing the model, concepts in evaluation and the idea of prioritizing evaluation questions across the organization as a whole.

Post-graduate university Masters papers for researchers and practitioners interested in further developing their understanding of evaluation and their ability to undertake evaluations.

All of these courses, apart from the managers’ courses, involve both discussion of evaluation models combined with hands on working with evaluation projects brought to the workshops by participants. This action learning approach ensures that participants go away with a feeling of mastery in at least some evaluation techniques, which further assists in promoting the idea that there are aspects of evaluation which can be done by people at all levels within a program, organization or sector. 
The idea is that the end result of this ongoing activity will be a sector which first has a much more sophisticated model of evaluation; second, is in a position to talk about evaluation questions within the sector; and third, all levels within the sector know how to undertake some evaluation tasks appropriate to their work situation. 

Organizational or sector level strategizing to prioritize evaluation questions.

 The third and final aspect of building evaluation capability discussed in this paper is to encourage organizational or sector strategizing about what are priority evaluation questions for an organization or a sector. It is not that there is necessarily not enough evaluation being done in a sector. Funders may be routinely demanding evaluation and service providers having evaluations done. The problem is whether evaluation resources are being used in the most efficient way possible. Current practice in New Zealand, which may well also be the case in the U.S., is for the following to happen.
An evaluator often gets called in to advise on evaluation methodology for a program. Often there are enough resources to do ‘an evaluation’. The problem is that there is often very little thought or guidance given by funders or others in the organization or sector as to what are the priority evaluation questions that should be being asked in this particular evaluation.
In theory, of course, the evaluator can use one of the stakeholder evaluation or related approaches and consult with the various stakeholders about what they see as the priority questions for evaluating the program. All evaluators will do this to some extent. The problem is that it does not really make sense to do this repeatedly on a program by program basis, particularly when the programs are relatively small. The program in question may or may not be a priority for evaluation. The evaluation resources may be better spent elsewhere and perhaps only a small scale evaluation should be undertaken for the program in question. For instance, it may be a well established program for which (contrary to most instances) good formative evaluation has already been carried out, and some previous work on similar program has shown some positive outcomes. It may be better to spend the evaluation resources that are available on an entirely different novel initiative which is exploring new ways of dealing with the social issue the original program is attempting to address. 
The key point is that there should not be an assumption that any particular type of evaluation, or any particular scale of evaluation is suitable for all programs. There is unfortunately this assumption in the notion that a program ‘needs an evaluation’. The assumption, from funders at least, is usually that this will consist of an outcome evaluation which will be able to accurately determine whether the program has met its objectives. In the author’s experience, funders usually give no thought to how feasible or expensive an outcome evaluation may be, they just pass the problem onto the service provider management and staff and any evaluation specialists they employ.
Of course, in addition to evaluation in the sense being used here, all programs need basic monitoring as to whether they are on track using cheap, routinely measured performance indicators. This is usually best dealt with as separate process, which can be linked to more complex evaluations where these are undertaken. The essential distinction is that monitoring should be undertaken on a routine basis for accountability and evaluation, because of its higher cost, used on a more selective strategic basis.
In contrast to the current, program focused, approach to evaluation planning, there should be an organizational or sector based approach to evaluation question prioritization.  Rather that simply saying that “every program needs an evaluation” it would be much more fruitful to say “how can we best spend our evaluation resources to answer priority evaluation questions for this organization or this sector?” This second question is particularly useful if it is put in terms of strategic planning for the future rather than a futile attempt to use evaluation as a routine method of achieving accountability – something much of the Performance Management Movement is yet to discover.
The author has been involved in organizational-level evaluation policy setting exercises. These are where an organization develops an explicit evaluation policy as a preliminary step to introducing a more strategic approach to the program-based evaluation work being undertaken by that organization. Typically these policies contain elements such as:
The evaluation model(s) that will be used in the organization
Policies regarding, and opportunities for, staff training in evaluation
Sources of, and procedures for, obtaining technical evaluation assistance
Procedures and stakeholder consultation standards for evaluation planning and   sign-off
Procedures and consultation processes in respect of cultural issues
Guidelines on the typical scope and type of evaluation for different size and type of program
Guidelines on the use of internal and external evaluators
Ethical and other related consideration
Policies about disclosure of evaluation information
In addition, some progress has been made in getting organizations to prioritize the evaluation questions they are asking of the programs which they are running. In the instances of this where the author has been involved, this has tended to be a fairly tentative process.
Of course, in most instances, one organization’s programs are just part of the overall activity in a sector. It its even more useful then for prioritization of evaluation to take place at the sector rather than the organizational level.  In an ideal world, sectors would have systems which give them the opportunity to reflect on what are the priority evaluation questions that need answering. They would get an indication of the cost of answering these questions and then move to make strategic decisions about which type of evaluations for which programs should and should not take place. This views the evaluation spend for a whole sector as one large research and development fund which needs to be spent wisely, rather than trying to make evaluation decisions on a program by program basis. This is not to say that people from the program level should not have some input into what are these evaluation questions, just that it should not all be left to that level.
Of course, it can be argued that already a lot of organizational and sector strategic considerations are factored into the evaluation requirements for an individual program. Funders will indicate which programs they want evaluated, the level of resources, and may indicate which evaluation questions they want answered. In addition, in reviews of the academic literature, and in priority setting processes within research funding bodies there will be prioritzation happening. In those sectors where there are ongoing research groups, involved in teaching, advising and undertaking a large number of evaluations, they will, in part, play a role through having a strategic view of a sector and which evaluation questions are the next priority. However, the author believes that this process is at the current time still too ad hoc and there is often a disjunction between priority setting, in those instances when it is taking place, and what actually happens on the ground in regard to the evaluation of the many programs that are subject to evaluation. Facilitating such prioritization work could be a major contribution of evaluators to building evaluation capability.
Exactly how to facilitate this cross organizational or sector prioritization is a difficult question. Sectors dealing with social issues tend to be made up of a diverse range of public and private groups funding a diverse range of programs. There are some innovative evaluation priority setting exercises going on in New Zealand at the moment in the labour and employment program area (McKegg, 2000). In addition, the author is also working with other sectors attempting to get this sort of prioritization process to occur. It would be good to share notes with other evaluators as to whether they see this as a priority and if they are having any success in this area.


This paper has looked at the question of mainstreaming evaluation. The author has drawn on his experiences largely with the public and community health and related social sectors in New Zealand.  In the New Zealand public and community health sector we have made more progress on the first two (models and training rather than sector prioritizing) of the three key elements for building evaluation capability described in the paper.
The intention of the paper has been to generate some points for discussion around the issue of mainstreaming evaluation. The questions that the author would like considered at AEA 2001 are: 

In mainstreaming is there a difference between building the evaluation profession and ‘giving evaluation way’?

Are the models of evaluation used in mainstreaming important? If so are the criteria suggested in this paper the right ones? What models are likely to be best for discussing evaluation with non-evaluation audiences?

Can we further develop evaluation approaches that can deal with the realities of on the ground programs community-based programs rather than just trying to use text-book evaluation designs?

How can we as evaluators support the development of indigenous people’s and other groups evaluation models and approaches?

What are innovative methods of training for mainstreaming evaluation?

How can organizational and sector level evaluation question prioritization exercises be encouraged? Is there a particular role for evaluators in facilitating these?

Author Note:
Dr Paul Duignan* works half-time as a Senior Lecturer at the University of Auckland where he teaches program evaluation at the post-graduate level. He has been involved in evaluation capability building in the public and community health sector in New Zealand for the last decade and a half. His PhD was on evaluation methodology for health promotion. He has taught evaluation method and concepts to people from all levels of the public and community health sector. He also consults for Parker Duignan Ltd in evaluation methodology and public policy issues.  
p.duignan@auckland.ac.nz, paul@parkerduignan.com
Alinsky, S. (1971). Rules for radicals.  New York, Random House.
Blalock, A. (1999). “Evaluation research and the performance management movement.” Evaluation 5(2): 117-149.
Campbell, D. T. (1975). Reforms as experiments. Handbook of evaluation research. E. L. Struening and M. Guttentag. Beverly Hills, Sage. 1: 71-100.
Casswell, S. (1999). Evaluation research. Social Science Research in New Zealand: Many Paths to Understanding. C. Davidson and M. Tolich. Auckland, Longman.
Casswell, S. (2000). “A decade of community action research.” Substance Use & Misuse 35(1&2): 55-74.
Casswell, S. and P. Duignan (1989). Evaluating health promotion: A guide for health promoters and health managers. Auckland, Department of Community Health, School of Medicine, University of Auckland.
Chelimsky, E. and W. R. Shadish, Eds. (1997). Evaluation for the 21st century: a handbook. Thousand Oaks, California, Sage.
Cook, T. and D. T. Campbell (1979). Quasi-experimentation: design and analysis issues for field settings. Boston, Houghton Mifflin Company.
Duignan, P. (1990). Evaluating health promotion: An integrated framework. Health Promotion Research Methods: Expanding the Repertoire Conference, Toronto, Canada.
Duignan, P. (1997). Evaluating health promotion: The Strategic Evaluation Framework. Psychology. Doctoral Dissertation, University of Waikato, New Zealand.
Duignan, P. and S. Casswell (1989). “Evaluating community development programmes for health promotion: Problems illustrated by a New Zealand example.” Community Health Studies 13(1): 74-81.
Duignan, P. and S. Casswell (1990). Appropriate evaluation methodology for health promotion. American Evaluation Association Annual Conference, Washington.
Duignan, P. and S. Casswell (1992). “Community alcohol action programmes evaluation in New Zealand.” Journal of Drug Issues 22: 757-771.
Duignan, P., S. Casswell, et al. (1992). Promoting change in health promotion practice: A framework for the evalaution of health promotion. Psychology and social change. D. Thomas and A. Veno. Palmerston North, The Dunmore Press.
Duignan, P., S. Casswell, et al. (1993). Evaluating community projects: Conceptual and methodological issues illustrated from the Community Action Project and the Liquor Licensing Project in New Zealand. Experiences with Community Action Projects: New Research in the Prevention of Alcohol and Other Drug Problems (CSAP Prevention Monograph 14). T. K. Greenfield and R. Zimmerman. Rockville, MD, U.S. Department of Health and Human Services.
Duignan, P., M. Dehar, et al. (1992). Planning evaluation of health promotion programmes: A framework for decision making. Auckland, Alcohol and Public Health Research Unit, School of Medicine, University of Auckland.
Fetterman, D., S. Kaftarian, et al. (1996). Empowerment evaluation: knowledge and tools for self-assessment and accountability. Thousand Oaks, CA, Sage.
Foucault, M. (1973). Madness and civilization: A history of insanity in the age of reason. New York, Vintage Books.
Freire, P. (1968). Pedagogy of the oppressed. New York, Seabury Press.
Greene, J. (1999). “The inequality of performance measurements.” Evaluation 5(2): 160-172.
Guba, E. G. and Y. S. Lincoln (1989). Fourth generation evaluation. Newbury Park, California, Sage.
Labonte, R. (1989). Community health promotion strategies. Readings for a new public health. C. J. Martin and D. V. McQueen. Edinburgh, Edinburgh University Press: 235-249.
McClintock, C. (1986). “Towards a theory of formative program evaluation.” Evaluation Studies Review Annual 11: 205-223.
McKegg, K. (2000). Personal communication.
Moewaka Barnes, H. (2000). “Collaboration in community action, a successful partnership between indigenous communities and researchers.” Health Promotion International 15: 17-25.
Moewaka Barnes, H. (In Press). “Kaupapa Maori: Explaining the ordinary.” Pacific Health Dialog.
New Zealand Board of Health Committee on Health Promotion (1988). Promoting health in New Zealand. Wellington, New Zealand Board of Health.
Patton, M. Q. (1986). Utilization focused evaluation. Newbury Park, Sage.
Rossi, P. H. and H. E. Freeman (1989). Evaluation: A systematic approach. Beverly Hills, Sage.
Scriven, M. (1991). Evaluation Thesaurus. Newbury Park, Sage.
Scriven, M. (1997). Truth and objectivity in evaluation. C. S. (1997): 477-500.
Scriven, M. and J. Kramer (1994). “Risks, rights and responsibilities in evalaution.” Evaluation Journal of Australasia 9(2): 3-16.
Smith, L. T. (2000). Decolonising Methodology: Research and Indigenous Peoples. London, Zed Books.
Turner, A., M. Dehar, et al. (1992). Doing evaluation: A manual for health promotion workers. Auckland, Alcohol and Public Health Research Unit, University of Auckland.
Waa, A., F. Holibar, et al. (1998). Programme evaluation: an introductory guide fo health promotion. Auckland, Alcohol and Public Health Research Unit/Whariki, University of Auckland.
Weiss, C. (1977). “Research for policy's sake: The enlightenment function of social research.” Policy Anaysis 3: 531-545.  


Top | Back | Home