‘We’re not moving fast and breaking things’ – ChatGPT-powered GOV.UK chatbot trials show promise but uncover ‘issues of accuracy and reliability’

Government Digital Service has published details of a recent testing exercise for GOV.UK Chat, which revealed both the potential and the pitfalls of deploying generative AI to engage with citizens

The Government Digital Service has pledged to take a “measured” approach to its work on a proposed GOV.UK chatbot, after public trials of the tool – powered by the technology behind ChatGPT – showed promise but shed light on “issues of accuracy and reliability”.

Plans for the platform, dubbed GOV.UK Chat, were revealed several months ago, as the technology entered a comparatively small-scale process of public testing. GDS, which is leading development of the tool, has published an update detailing the findings of that private pilot exercise, in which 1,000 citizens were invited to participate.

A survey of their experiences found that about 65% of participants were satisfied overall, while 70% said they received useful answers from the technology – which, using large language model (LLM) tech from ChatGPT creator OpenAI – is designed to scour content on GOV.UK and use it provide “ a human-like response” to users’ questions.

However, the public testing process found that, ultimately, “answers did not reach the highest level of accuracy demanded for a site like GOV.UK”, according to the GDS update.

There were several so-called hallucinations – the term applied to instances in which an AI system wrongly presents something as fact. These tended to occur when users presented “ambiguous or inappropriate queries”.

The pilot also found that, in some cases, the chatbot was unable to come up with an answer because the GOV.UK page where it could be found was “too long”.

Related content

As work on the development of the technology continues, GDS believes that “accuracy gains could be achieved by improving how we search for relevant GOV.UK information that we pass to the LLM, and by guiding users to phrase clear questions, as well as by exploring ways to generate answers that are better tailored to users’ circumstances”.

The Cabinet Office-based digital unit also acknowledged that, in response to some questions that cannot be answered using GOV.UK content, “it’s clear we need to redirect people in different ways”.

The update added: “We also found that some users underestimated or dismissed the inaccuracy risks with GOV.UK Chat, because of the credibility and duty of care associated with the GOV.UK brand. There were also some misunderstandings about how generative AI works. This could lead to users having misplaced confidence in a system that could be wrong some of the time. We’re working to make sure that users understand inaccuracy risks, and are able to access the reliable information from GOV.UK that they need.”

GDS stressed that, while testing has shown the potential benefits of a government chatbot, it has also highlighted “the nascent nature of this technology”. A careful and considered approach to ongoing work on the platform is therefore required.

“These findings validate why we’re taking a balanced, measured and data driven approach to this technology — we’re not moving fast and breaking things,” GDS said. “We’re innovating and learning while maintaining GOV.UK’s reputation as a highly trusted – according to internal polling – information source and a ubiquitously recognised symbol in the UK. Based on the positive outcomes and insights from this work, we’re rapidly iterating this experiment to address the issues of accuracy and reliability. In parallel we’re exploring other ways in which AI can help the millions of people who use GOV.UK every day.”

Sam Trendall

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *

Thank you! Your subscription has been confirmed. You'll hear from us soon.
Subscribe to our newsletter