Unit 1 Overview of History and Theory of Program Evaluation

Program evaluation is a cross-disciplinary field of study. It has a knowledge base that is of value and utility to educators in both conventional education and distance education settings, to nurses and nurse educators, to those responsible for training and to trainers in government, in business and industry, and in the military, and to many involved in post-secondary or higher education. This unit introduces you to the historical development of the field of program evaluation, and the theoretical frameworks that helped to create program evaluation as a specialized area of study. Program evaluation is a fascinating field to study, in that it has borrowed from many disciplines in its formulation of models and approaches. It is also of considerable practical value, in that it can be used to assess the merit and worth of educational programs in all kinds of setting, and can be used to improve learning outcomes; something that we all, as educators, strive to accomplish.
Without an understanding of the historical scientific measurement roots of program evaluation, and the development of various models and approaches over an extended period of time, it is difficult to understand why there are so many approaches available to evaluators today. Your readings in Unit 1 will provide the understanding of the history and theoretical foundations of program evaluation.

Unit 1 - Objectives

After completing this unit, you should be able to do the following:
1. Discuss how the roots of program evaluation lie in research, particularly scientific measurement.
2. Distinguish between program evaluation's purposes, uses, and essential activities.
3. Differentiate between formative and summative evaluation.
4. Distinguish between testing and evaluation.
5. Discuss the emergence of modern program evaluation in the 1960s.
6. Describe five theoretical frameworks for program evaluation.

Unit 1 - Commentary

What is Program Evaluation?

The term evaluation applies to any effort that increases human effectiveness through systematic and data-based investigation. Evaluation can be used for numerous purposes including:
· measuring student achievement via testing (student evaluation);
· determining effectiveness of personnel in their job performance (personnel evaluation);
· determining the effectiveness of the organization and the need for organizational change (organizational evaluation); and,
· assessing the effectiveness of a given program, project, or curriculum (program evaluation).
It is this last type of evaluation, program evaluation, that is the focus of this course.
A simple, general definition of program evaluation is provided by The Joint Committee on Standards for Educational Evaluation (1994, p. 3), "Program evaluation is the systematic investigation of the worth or merit of a program."
A more detailed definition is provided by Worthen & Sanders (1997, p. 5), "Evaluation is the identification, clarification, and application of defensible criteria to determine an evaluation object's value (worth or merit), quality, utility, effectiveness, or significance in relation to these criteria."
Talmadge (1982, p. 594) defines evaluation in terms of the purposes it serves. "Three purposes appear most frequently in definitions of evaluation: (1) to render judgements on the worth of a program; (2) to assist decision-makers responsible for deciding policy; and (3) to serve a political function."

Perhaps the most succinct definition is provided by Patton (1997, p. 23), "Program evaluation is the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgements about the program, to improve program effectiveness, and/or to inform decisions about future programming."

Whichever definition you prefer, you should note that judgement is an inherent part of program evaluation. As Talmadge notes, central to any definition of program evaluation is the concept of judgement. Both Worthen and Sanders and Patton agree, noting that the idea of rendering judgement is what evaluation is all about. Without judgement there is no evaluation.

What are the Origins of Program Evaluation?

But it was not always so. In the beginnings of evaluation, the data spoke for themselves, so there was no need for evaluators to make judgements. They simply provided data to administrators, who made judgements (or at least decisions) some of the time.
The humble roots of program evaluation lie in the educational measurement movement, which was indeed the focus of most educational research of that era. Evaluation did not emerge as an independent field of study and as an independent profession. It grew out of the field of educational measurement, which in turn grew out of the explorations in the natural sciences and scientific research. By the mid-1800s Horace Mann worked for the state of Massachusetts, gathering data on which to base educational decisions. He submitted 12 reports over the course of a decade, all supported by actual (empirical) evidence. In 1845 the Boston Survey was the first use of printed tests for assessment of student achievement. This consisted of a sample of Boston students being tested on a variety of school subjects. As Worthen and Sanders (1997) note, "[this] was the first attempt at objectively measuring student achievement to assess the quality of a large school system" (p. 27).
Joseph Rice, at the turn of the century, did a number of studies of school efficiency throughout the United States. His studies were firmly based in measurement and the scientific approach, providing hard data in the form of test scores to support his assessments. In fact, it was Joseph Rice who recommended the use of standardized tests and examinations (Guba and Lincoln, 1981, p. 2).
Edward Thorndike, who became known as the father of the educational testing movement, advanced the use of measurement science and measurement technology during the first two decades of the 20th century. Standardized tests were used to measure human abilities and were also used as the primary means of evaluating schools (Worthen and Sanders, 1997, p. 28).
By the First World War numerous large U. S. school systems had Bureaus of School Research whose function was to do assessment of student achievement. From the beginning the testing movement was associated with assessment of both individual students and of the schools themselves, standardized student test scores were used to measure (or evaluate) the success of teaching and school administration (Madaus, Airasian and Kelleghan, 1980, p. 6).
The standardized testing movement expanded to the military and beyond to private industry in the 1920s and 1930s, where these tools were used to evaluate recruits or applicants for jobs. During this time the terms measurement and evaluation were synonymous, although measurement was more frequently used. Guba and Lincoln (1981) note that, "evaluation and measurement were virtually interchangeable concepts. Indeed the term evaluation was heard infrequently, and when it did occur it was almost always in conjunction with measurement - which usually had top billing - measurement and evaluation" (p. 2).
The measurement and evaluation movement in education of the 1920s and 1930s was not closely aligned with school programs and curricula. The focus of testing was individual performances of students and individual differences, thanks to Thorndike, and as Guba and Lincoln (1981) note, "there was little reason for believing that the curriculum was not exactly as it should be" (p. 3). The testing and evaluation movement was indeed used to judge the quality of schools, but the standardized measures were of student achievement only - there was no attempt to evaluate or even to look at school programs of study, courses, and curricula - or indeed other variables that might impact student performance.
This era of measurement and evaluation had little to do with program evaluation as it is understood today - other than to serve as a precursor. As Guba and Lincoln (1985) state:
A serious deficiency of the early measurement and evaluation movement was its targeting of students as the objects of evaluation. For, shortly after the First World War, it became evident that school curricula needed to undergo dramatic revision, and an evaluation approach that could not provide other than student data, could not serve the purposes for evaluation [that were] now being contemplated. (p27)
The real birth of program evaluation began with Ralph W. Tyler, who completed his now famous Eight Year Study of Ohio Schools. This was the first time that the focus moved to the curriculum or program of studies, and while student test scores were still the primary measures used, they were measured against the attainment of curriculum goals. Tyler's framework became the first ever evaluation model.

The Growth of Program Evaluation

For a three decades Tyler's approach to evaluation of school curricula and programs reigned supreme. Those practicing the craft of evaluation in education followed Tyler's approach, which became known as an objectives-oriented approach because of its focus on curricular goals and objectives. Evaluators did so because there were no alternatives. But in 1957 the Russians launched Sputnik. The USA reacted immediately to being relegated to second place in terms of science and technological developments, and the blame was placed on the education system. Mathematics and science education, in particular, were the focus of massive amounts of state and national expenditures for new curriculum development efforts, and it was deemed essential, by the federal authorities, that these new curricula be evaluated.In the early 1960s the Kennedy government in the USA placed added emphasis on education and schooling, and in 1965 the Elementary and Secondary Education Act was passed by the US senate, providing for huge expenditures of federal dollars on educational research, development, and dissemination activity. Under various Title Projects, program evaluation was required for each grant awarded.These two events, and their requirements for comprehensive evaluation activity, resulted in incredible growth of evaluation activity. It soon became obvious to evaluators that Tyler's approach was very limited, and that evaluation of programs should encompass more than simply ensuring that goals and objectives were attained. New approaches were required in order to do comprehensive evaluations. There was a need for good guidelines for doing evaluations, for an increase in trained and knowledgeable evaluators, and for expansion in the function of evaluation from end-of-program, summative evaluation to during-implementation, formative evaluation, where the evaluation would lead to improvements and changes before the programs were fully adopted. In the 1960s and 1970s a new era in program evaluation was underway.

Theoretical Frameworks

What is meant by the word theory? Shadish, Cook, and Leviton (1991) indicate that:
. . . theory connotes a body of knowledge that organizes, categorizes, describes, predicts, explains, or otherwise aids in understanding and controlling a topic.the ideal evaluation theory would describe and justify why certain evaluation practices lead to particular kinds of results, across situations that evaluators confront. (p. 30-31)
The field of program evaluation has been practiced for more than seven decades. Over that time evaluation practice has changed and with these changes in practice there has been broadening of purposes and eventually development of different theoretical stances. Shadish, Cook, and Leviton (1991, p. 32) outline five theories of evaluation that were constructed between approximately 1965 and 1990, from various theoretical stances taken by a number of evaluation writers and practitioners.
· Theory of Social Programming
· Theory of Utility
· Theory of Valuing
· Theory of Knowledge
· Theory of Practice

Theory of Social Programming

Most evaluators hold the following views about social programming theory:
· Social programming ameliorates (improve) social problems incrementally rather than radically.
· Social programs exist in a political and organizational context that makes uniform, planned change difficult to implement.
· Phasing social programs in or out promises more impact on problems than phasing projects or elements in or out, but fundamental shifts in programs are harder to achieve, and less likely, than shifts in availability or mix of projects and elements.
· Evaluation is an omnipresent political activity in social programs, even when no formal evaluation occurs.
· The quality of evaluation depends on other social problem-solving activities, such as deciding on an important intervention to be evaluated, or defining a social problem adequately (Shadish, Cook, and Leviton, 1991, p. 446-447).

Theory of Utility

Past experience has taught evaluators that active steps must be taken to increase the use of their results.
· Many kinds of use occur; evaluators are concerned with both instrumental and conceptual use.
· Short-term use is the most compelling justification for funding evaluations.
· Evaluations rarely determine decisions instrumentally; enlightenment occurs more frequently (Shadish, Cook, and Leviton, 1991, p. 454).

Theory of Valuing

Shadish, Cook, and Leviton, (1991, p. 455) cite Scriven, "In its early development, evaluation paid scant attention to values, perhaps because evaluators naively believed their activities could and should be value-free." But experience showed that it is impossible to make choices in the political world of social programming without values having impact. Scriven's "logic of evaluation" steps apply here. They are: (1) justifiable criteria of merit are established; (2) justifiable standards of performance are selected for each criterion; (3) performance is measured on each criterion; (4) with multiple criteria, results are integrated into a single statement about the value of the evaluand.
· Evaluation cannot be value free, since evaluands are products of social programming.
· Evaluation inevitably mimics the first three steps of Scriven's logic of evaluation, but not always the fourth step.
· Considering multiple stakeholder interests increases the chance that all relevant value perspectives are included (Shadish, Cook, and Leviton, 1991, p. 462).

Theory of Knowledge

The legacy of evaluation's ties to measurement and the scientific research model limited their ability to answer key epistemological questions. Yet consideration of what constitutes valuable knowledge is critical to the practice of evaluation. Evaluators agree that:
· Epistemology and methodology are essential topics in evaluation.
· All theorists postulate a real world, but they differ greatly as to its knowability and complexity.
· Program evaluation is an empirical endeavour in a social sciences tradition.
· Knowledge about many different kinds must be constructed in most evaluations, but the relative emphasis given to each differs across studies.
· No social science method can be rejected from the evaluator's repertoire.
· All methods are fallible.
· The quality of knowledge increases with public scrutiny of it.

Theory of Practice

Evaluators may theorize and grapple with the knowledge base of their field, but they are also practitioners, charged with implementing evaluations in defined contexts. They have one foot in the world of evaluation knowledge, and the other in the real-world setting in which the program operates. They agree that:
· Evaluation typically occurs under time and resource constraints that require difficult trade-offs.
· At least initially, evaluators are rarely welcomed by the many participants and stakeholders in and around any program being evaluated.
· A single evaluation of a program is inevitably a flawed evaluation.
· To facilitate use of evaluation results, evaluators must take active steps toward that end.


Program evaluation emerged as a separate and distinct field, from its educational research and educational measurement birth, in the 1930s, and since that time it has grown as a field that has a distinct purpose, theory base, and focus. The earliest development was still tied closely to the educational measurement movement, but it did focus on a utilitarian model of passing judgement on curricula by using the curricular objectives as the standard against which a program's success or failure was judged. Over the next three or four decades, other models and approaches emerged, and this growth of the field was encouraged by federal funding programs, which demanded as part of the funding contract that programs be comprehensively evaluated. Today there are numerous models, forms, and approaches to evaluate educational training programs - all soundly established through application and usability of results.