How they got exam grading so wrong

empty-314554_1280 (1)

I was one of the hundreds of thousands of British students who received their A-level grades last Thursday. Today is GCSE results day.

These were not normal results days. The coronavirus pandemic affected the day itself, with social distancing guidelines adhered to at schools where students were coming to collect their results. But the pandemic also affected how the results themselves were determined — not by exams as is usual but by teacher predictions, student rankings, and a government moderation system.

When schools shut in late March and the government decided to cancel the summer exams, teachers were asked to determine what grades they thought individual students were capable of achieving in the exam. It was decided then by Ofqual — the exams regulator in charge of this process — that these grades would likely need to be moderated.

In all fairness, they were proven right. After moderating 40 per cent of grades down, the end result was a statistical distribution that was still a little more generous than usual.

Along with the very natural human bias towards optimistic outcomes, teachers can’t predict exam performance but rather exam capability. A number of factors, including the student’s mental health, personal situation, and how they deal with exam pressure can affect their actual exam performance.

So moderation was needed, although teachers shouldn’t be blamed for that. But that, I’m afraid, is where the competent decision-making ends.

The centre-assessed grades (those determined by teachers) — CAGs for short — were moderated down using an algorithm that took into account several different factors. Using the school’s prior performance, the individual student’s past performance, and a ranking of all the students within a school by ability, the algorithm moderated some grades down, kept most the same, and even moderated a very small number upwards.

On a macro level, the statistics would suggest that Ofqual got it right. The aforementioned generosity meant that about 2 per cent more students passed and got the top grades of A and A*. The distribution of grades, however, looked like it normally does, just with the bell curve shifted slightly to allow for this slight uptick in the pass rate and the number of students receiving the best grades.

But the macro level, as “robust and dependable” as Boris Johnson might claim it to be, is not the real story. Despite this statistical accuracy, there were many individual inaccuracies, stemming from a number of systemic and fundamental failures.

Apart from classes with five or fewer students, the moderation algorithm worked something like this…

The distribution of grades was determined for the class. This was done by taking the average distribution for the whole country, and then shifting this distribution up or down based on the school’s prior performance and the average prior performance of the students within that class — at GCSEs which they sat in 2018.

If Ofqual determined that the grades the teacher had given were not in line with this distribution, they altered the results. Once the algorithm determined the need to moderate, the CAGs became entirely irrelevant.

Say there is a class of 20 students, each accounting for 5 per cent of the class. And let’s suppose — to use the example employed in this useful Twitter thread — that Ofqual’s algorithm has settled on this distribution:

A*: 7%; A: 15%; B: 22%; C: 52%; D: 3%; E: 0.5%; U: 0.5%

As is brilliantly explained in the linked thread, if only 7 per cent of the class may get an A*, then there is no room for the two or three students who were predicted A*. Only one of these gets the top grade. The next three students in the rank order get A’s — with another 2 per cent left over, not enough to fit in the fifth student who also was predicted an A.

You might think this is perhaps slightly unfair, but not awful. Let’s fast-forward to the last student — predicted perhaps a C, or (in some cases) a B. They don’t get a D, though, because that only takes the class to 99 per cent. The algorithm favours rounding down, and this student doesn’t even get an E. They get a U.

In case you’re not familiar with English grading, the C this student was predicted is a pass. The U they were given stands for “unclassified” — the exam script is so bad, perhaps not even a question on the paper answered, that a grade cannot be awarded. How can someone who has not sat an exam, and was predicted a C grade, end up with a ‘U’?

This is how rounding down and treating classes like statistical models destroyed plans for university, jobs, and apprenticeships. Some people had individual grades go down not just by one grade but by two, three or, in the very rarest cases, four. 

Students studying at the poorest-performing schools, with good prospects for their A-level grades but with poor GCSE results, and who fell onto one of the unfortunate downgrading boundaries described above, were the most disadvantaged of all by this process.

What made this all even more difficult to take was the completely non-functioning appeals process. Students who were unfairly downgraded and thus missed their university offers were left in limbo, promised their mock exam grades as a minimum upon but with no knowledge of when the official confirmation would come.

Some universities were only holding places open for a limited amount of time, and other options closed quickly – with many since the U-turn being unable to find courses with sufficient capacity at university.

The government’s first U-turn (on the Tuesday before results day, just 36 hours before students started receiving their results) allowed students to appeal to get their mock grades, putting unprecedented stress on Ofqual’s limited appeals system. Those in limbo, with no determined future, must wait up to 42 days — by which time many courses will already have begun.

This practical failing is perhaps as great as — if not greater than — the policy failing.

The government should have acted much more quickly to remedy the situation and to rescue the futures of thousands of young people who have not performed badly, but have been told that they would have performed badly by an algorithm which has zero actual knowledge of their ability and characteristics. 

When the government realised that the route of an appeals process with hundreds of thousands of students appealing their grades was completely unviable, they backed down – eventually, leaving time for an embarrassing separate U-turn at Ofqual, which released guidance about the appeals process on Sunday before suspending it the very same day.

All students can now take whichever is higher out of their centre-assessed grade and calculated grade.

The cap on the number of students allowed to go to universities, introduced this year to prevent universities seeking profits by admitting sub-par students to replace international students, has also been lifted – albeit very quietly to avoid further embarrassment for the government. That doesn’t mean that all has been resolved, though, as for many students, their offers have been withdrawn and there is not enough space left on courses.

More widely, though, this situation has revealed a need for systemic change in the curriculum and the means of assessment. Either more standardised testing is needed, on a more regular basis to ensure that the system is not dependent on a set of final examinations. Or standardised testing must be entirely done away with, with the exception perhaps of basic aptitude testing, in favour of individualised, student-led, project-based learning.

The government has got this wrong on nearly every level. But resolving the remaining issues surrounding university admissions, alongside ensuring that long-term reforms are made to the education system, could restore faith in the government’s — and particularly the Education Secretary’s — competence.

Rate this post!

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

Radix is the radical centre think tank. We welcome all contributions which promote system change, challenge established notions and re-imagine our societies. The views expressed here are those of the individual contributor and not necessarily shared by Radix.

Leave a Reply

The Author
Latest Related Work
Follow Us