GDS reports achieving 90% accuracy in GOV.UK Chat pilots


AI assistant, based on technology from LLM provider Anthropic, is due to be rolled out to GOV.UK app users later this year ahead of launch on the government’s main website

The Government Digital Service has reported achieving 90% accuracy for responses generated by its in-development artificial intelligence assistant GOV.UK Chat.

According to an update on the project, GOV.UK Chat has been used by more than 10,000 participants in two public pilots over the course of the last 18 months. During that time users asked around 26,000 questions about government services, from tax to benefits to visas.

In a blog post published this week, lead project manager Sam Dub and lead user researcher Sharon McDonald said “strong progress” had been made on increasing accuracy and answer quality.

“Thanks to the work of our data scientists, and advances in the underlying AI models, we’ve been able to increase answer accuracy scores from 76%, our earliest benchmark, to our latest figure of 90% accuracy across all topics,” they said.

Dub and McDonald said answers are assessed using a combination of subject-matter experts and evaluation tools.

“Chat exclusively draws on guidance published on GOV.UK, and our assessors only rate an answer as ‘accurate’ if it meets all the standards of published content,” they said. “In cases of ‘inaccuracy’, it’s often that GOV.UK Chat is able to answer some but not all aspects of a user’s question. Overall, these accuracy scores are in line with industry benchmarks, and for government-related questions GOV.UK Chat scores higher than widely used consumer AI assistants.”


Related content


According to the blog a survey of GOV.UK Chat users saw 73% report that they found the assistant “useful”, while 64% said they were “satisfied” with the assistant.

“We’re pleased with these numbers, and think we can increase them further with improvements in areas like answer speed,” Dub and McDonald said.

The online post added that GOV.UK Chat takes an average of 10.7 seconds to answer a question.

Dub and McDonald said there is a trade-off between speed and accuracy.

“For us, accuracy is the most important thing, and consequently GOV.UK Chat responses are slower than we’d ideally like,” they said.

Nevertheless, they described the average answer time as “within acceptable bounds” for users of GOV.UK Chat.

“In testing, satisfaction increased when we simulated faster answer speeds,” Dub and McDonald said. “As a team, we’ll be looking at ways to increase answer speed in future versions without sacrificing answer quality. We’re considering implementing answer streaming, where the first part of an answer appears to users before the answer is fully written. This tested well with users, but requires redesigning how some of our safety guardrails work, so will be a substantial bit of work for the team to implement.”

Dub and McDonald said feedback from the pilots indicated that some GOV.UK Chat users wanted to be to speak to an adviser with access to their personal information, something the system does not currently support.

“Longer term, we’re working on being able to hand users over to departmental customer support where needed, connecting users – within the same interface – to the right person to answer their question,” they said.

GDS is now using Amazon’s Bedrock platform and Anthropic’s Claude models to power the latest version of GOV.UK Chat.

Dub and McDonald said the system “coped comfortably” with demand and is built to allow the team to upgrade to new models as they become available.

Jailbreak rates
In their blog, Dub and McDonald reported that the pilots of GOV.UK Chat had so-far seen 508 attempts to “jailbreak” the AI assistant. Jailbreaking is deliberately trying to manipulate large language models into producing inappropriate harmful content, often for malicious purposes.

“This is an ongoing risk we will need to manage, but the safety guardrails we’ve put in place successfully prevented all these attempts,” Dub and McDonald said.

According to their blog, the first GOV.UK Chat pilot recruited 10,136 users who asked 23,838 questions to a web version of the AI assistant.

The most recent pilot took place in the GOV.UK app, when the iOS TestFlight system was used to give 641 people access to a version of the GOV.UK app with GOV.UK Chat integrated into it. These users asked 2,670 questions in four weeks.

Dub and McDonald said they are keen to make GOV.UK Chat widely available as soon as possible. They said it will be released first to users of the GOV.UK App, with testing to make the assistant available on the GOV.UK website due to start later this year.

Jim Dunton

Learn More →