Aug 15, 2020 10:00 AM

An Algorithm Determined UK Students' Grades. Chaos Ensued

This year's A-Levels, the high-stakes exams taken in high school, were canceled due to the pandemic. The alternative only exacerbated existing inequities.

three students look at their exam results

The AI Database →

Application

Prediction

Sector

Education

Technology

Machine learning

Results day has a time-worn rhythm, full of annual tropes: local newspaper pictures of envelope-clutching girls jumping in the air in threes and fours, columnists complaining that exams have gotten far too easy, and the same five or six celebrities posting worthy Twitter threads about why exam results don’t matter because everything worked out alright for them.

But this year, it’s very different. The coronavirus pandemic means exams were canceled and replaced with teacher assessments and algorithms. It has created chaos.

In Scotland, the government was forced to completely change tack after tens of thousands of students were downgraded by an algorithm that changed grades based on a school’s previous performance and other factors. Anticipating similar scenes for today’s A-level results, the government in England has introduced what it’s calling a ‘triple lock’—whereby, via stages of appeals, students will effectively get to choose their grade from a teacher assessment, their mock exam results, or a resit to be taken in the autumn.

While that should help reduce some injustices, the results day mess could still have a disproportionate effect on students from disadvantaged backgrounds, with knock-on effects on their university applications and careers. The mess shines a light on huge, long-term flaws in the assessment, exams, and university admissions systems that systematically disadvantage pupils from certain groups.

Forget the triple lock, ethnic minority students from poorer backgrounds could be hit with a triple whammy. First, their teacher assessments may be lower than white students because of unconscious bias, argues Pran Patel, a former assistant head teacher and an equity activist at Decolonise the Curriculum. He points to a 2009 study into predictions and results in Key Stage 2 English which found that Pakistani pupils were 62.9 percent more likely than white pupils to be predicted a lower score than they actually achieved, for example. There’s also an upwards spike in results for boys from black and Caribbean background at age 16, which Patel says corresponds to the first time in their school careers that they’re assessed anonymously.

Not everyone agrees on this point. Research led by Kaili Rimfeld at King’s College London, based on data from more than 10,000 pupils, has found that teacher assessments are generally good predictors of future exam performance, although the best predictor of success in exams is previous success in exams.

But because of fears over grade inflation caused by teachers assessing their own students, those marks aren’t being used in isolation. This year, because of coronavirus, those potentially biased teacher assessments were modified—taking into account the school’s historical performance and other factors that may have had little to do with the individual student. In fact, according to TES, 60 percent of this year’s A-Level grades have been determined via statistical modeling, not teacher assessment.

This means that a bright pupil in a poorly performing school may have seen their grade lowered because last year’s cohort of pupils didn’t do well in their exams. “Children from a certain background may find their assessment is downgraded,” says Stephen Curran, a teacher and education expert. This is what happened in Scotland, where children from poorer backgrounds were twice as likely to have their results downgraded than those from richer areas.

There’s injustice in the appeals process too—particularly in England, where the decision over whether or not to appeal is up to the school, not the pupil. “I think it’s really scandalous that the pupils can’t appeal themselves,” says Rimfeld, whose own child was anxiously awaiting their results. “It’s just astonishing the mess we created, and it’s really sad to see.”

There will be huge differences in which schools decide or are able to appeal—inevitably, better resourced private schools will be able to appeal more easily than underfunded state schools in deprived areas. “The parents will pressure them, and they’ll be apoplectic if their child does not achieve the grades they expected,” says Curran. In the state system, meanwhile, “some schools will fight for their kids, and others won’t,” and teachers are on holiday until term starts anyway.

On August 11, Gavin Williamson announced the triple lock that would allow students to pick from their teacher-assessed grade, their mock exam result, or doing a resit in the autumn if they don’t agree with the grade the system gives them initially. But there are huge problems there too. “Nobody is consulting with anybody about this,” says Rimfeld. “There are schools where there are no mocks, some schools do several mock exams—is it going to be the average? How is that going to work?”

The government is still figuring out exactly how mock results will be used, but there are vast discrepancies in conditions that mocks are taken in, and no centralized record of mock results. Some schools don’t even collect that data centrally for their own pupils. Sometimes teachers will downgrade results in a mock exam in order to scare certain students into working harder for the remainder of the year, says Patel. He doesn’t think including mocks will do anything to help repair bias. “Not in the slightest,” he says. “Because the teacher who is assessing your grade is the same teacher who marked your mock exam.”

That means it will be difficult for teachers, who Patel stresses may not have much experience marking exam papers, to untangle their conscious or unconscious perceptions from the words on the page in front of them. “Teachers are now being asked to make decisions that are potentially life-changing by completing a task that they're not qualified or suitably trained to do,” he says.

Even if two children end up with the same final grade after this process, the delays and inaccurate assessments could prove vital—particularly now, but also in more normal years. If you’re predicted three As, you’re more likely to apply and be accepted by prestigious universities, and more likely to be taught the relevant material, and more likely to actually make the grade.

If you’re predicted three Cs and get three As, by the time your results come out, it might already be too late for you to apply to the best universities without taking a year out—the die has been cast, not by your performance, but by your teacher’s assessment.

Teachers are aghast at the mess that’s been allowed to unfold. Curran argues that exams should simply have been taken later in the year, with social distancing implemented. Now, he says, we’re in a situation where results have become a political issue—and the GCSE and A-Level students of today are the voters of tomorrow.

Universities are also eyeing the situation nervously. The people we spoke to have been looking at the situation in Scotland and suspect that many pupils—at least those from schools that can afford the appeals—will essentially end up getting whatever grade they want. “In the end we get to a situation where it’s ‘pick a number’ because you’ve got no reliable sources of information there at all,” says Curran.

That will have an impact on university placements, which are generally overallocated to account for people missing their targets. Some universities will have far too many people who have made their grades, while those lower down the rankings may find themselves scrambling for students.

A smarter use of data could help tackle the problem, Patel argues. The Office of Qualifications and Examinations Regulation has used data about school performance to head off grade inflation, when instead, it should be using data about hidden bias to counteract societal injustice.

Suddenly oversubscribed universities could look inside the black box and see which pupils were downgraded and why, and use that information to make assessments about who to give places to. Arguably they should be doing that more often anyway, with contextual offers that take into account how much easier it is to get good grades for people from certain social or economic backgrounds.

“Teacher assessment is prone to bias, but there are lots of other ways of assessing pupils, and if you embrace lots of different techniques, you can ameliorate that impact,” Patel says. “There’s no ideal situation, but the problem here is that exams were never a great metric for learning or success anyway.”

This story originally appeared on WIRED UK.