Criterion-Referenced Measurement for Educational Evaluation and Selection

Abstract
In recent years, Sweden has adopted a criterion-referenced grading system, where the grade outcome is used for several purposes, but foremost for educational evaluation on student- and school levels as well as for selection to higher education. This thesis investigates the consequences of using criterion-referenced measurement for both educational evaluation and selection purposes. The thesis comprises an introduction and four papers that empirically investigate school grades and grading practices in Swedish upper secondary schools.

The first paper investigates the effect of school competition on the school grades. The analysis focuses on how students in schools with and without competition are ranked, based on their grades and SweSAT scores. The results show that schools that are exposed to competition tend to grade their students higher than other schools. This effect is found to be related to the use of grades as quality indicators for the schools, which means that schools that compete for their students tend to be more lenient, hence inflating the grades. The second paper investigates grade averages over a six-year period, starting with the first cohort who graduated from upper secondary school with a GPA based on criterion-referenced grades. The results show that grades have increased every year since the new grading system was introduced, which cannot be explained by improved performances, selection effects or strategic course choices. The conclusion is that the increasing pressure for high grading has led to grade inflation over time. The third paper investigates if grading practices are related to school size. The study is based on a similar model as paper I, but with data from graduates over a six-year period, and with school size as the main focus. The results show small but significant size effects, suggesting that the smallest schools (1000 students) are lower grading than other schools. This is assumed to be an effect of varying assessment practices, in combination with external and internal pressure for high grading. The fourth and final paper investigates if grading practices differ among upper secondary programmes, and how the course compositions in the programmes affect how students are ranked in the process of selection to higher education. The results show that students in vocationally oriented programmes are higher graded than other students, and also favoured by their programmes’ course compositions, which have a positive effect on their competitive strength in the selection to higher education.

In the introductory part of the thesis, these results are discussed from the perspective of a theoretical framework, with special attention to validity issues in a broad perspective. The conclusion is that the criterion-referenced grades, both in terms of being used for educational evaluation, and as an instrument for selection to higher education, are wanting both in reliability and in validity. This is related to the conflicting purposes of the instruments, in combination with few control mechanisms, which affects how grades are interpreted and used, hence leading to consequences for students, schools and society in general.


Table of Content
1.  Introduction11
1.1. Disposition of the thesis13

2.  Instruments in educational assessment14
2.1. Norm-referenced measurement14
2.2. Criterion-referenced measurement15

3.  Quality issues18
3.1. Reliability18
3.2. Validity20

4.  Assessment in Sweden23
4.1. The change of system23
4.2. Grading25
4.3. Selection to higher education26
4.4. The selection instruments30

5.  Summary of the papers31
5.1. Paper I32
5.2. Paper II33
5.3. Paper III34
5.4. Paper IV35

6.  Discussion36
6.1. The results37
6.2. Evidence of, and consequences for interpretation and use.38

7.  Conclusions42
7.1. Suggestions for future research42
References43