In a newly filed transparency document, the Government Digital Service has provided details of the operation, safety measures, and performance so far of Whitehall’s chatbot, powered by the Claude LLM
The new GOV.UK Chat artificial intelligence tool is “beating industry standards” for the accuracy of its response to users, according to newly published government records.
Development of the chatbot – which is designed to draw on information from about 100,000 GOV.UK pages to provide human-like conversational answers to users’ questions – began in 2023, and is led by the Government Digital Service.
The technology remains available only to a small number of test users and has yet to be implemented broadly; last year a senior government tech figure said that a wide-scale rollout was unfeasible until issues with so-called hallucinations – AI tools give inaccurate or inappropriate responses – have been adequately addressed by the creators of large language models (LLMs).
In newly released data, published via government’s algorithmic transparency recording standard hub, GDS indicated that, while these kind of answers have not been entirely eliminated, notable progress has been made over the past two years.
“With any AI tool there is a risk of inaccurate responses. For GOV.UK Chat, this would mean that Chat has provided inaccurate information in response to a user query,” the document says. “[In] mitigation, GOV.UK Chat have tested across a range of topic areas using automated and manual evaluation processes, assessing accuracy of responses, how grounded they are on GOV.UK content, and how complete the response is. GOV.UK Chat team have iterated to improve for accuracy, and have seen continual improvement with the current version of Chat beating industry standards.”
It adds: “The GOV.UK Chat team have [also] undertaken actions to ensure all users of Chat are aware that the answers may be inaccurate, and that they should check their answers. Links to pages used to generate an answer are always provided to users, alongside a reminder to check their answers.”
To keep tabs on the ongoing performance of the technology, “the GOV.UK Chat team applied a hybrid evaluation approach combining automated and manual methods”, the record states.
Automated tools have been deployed to “quantify answer quality dimensions such as factual precision, factual recall, relevancy, and groundedness”, while “beyond automated evaluation, we conduct structured manual evaluations to capture answer accuracy, answer completeness, and interaction quality, producing performance estimates for internal communication and stakeholder alignment”.
The transparency document adds: “We also perform red teaming, systematically probing the chatbot with adversarial and edge-case inputs to uncover vulnerabilities and safety risks. Together, these methods provide both a realistic view of end-user experience and a risk-aware perspective on system performance.”
Related content
- ‘We’re not moving fast and breaking things’ – ChatGPT-powered GOV.UK chatbot trials show promise but uncover ‘issues of accuracy and reliability’
- Department for Education ‘assessing risks’ of ChatGPT
- Government signs collaboration agreement with ChatGPT-maker OpenAI
The initial version of the chatbot that was trialled in 2023 was based on technology from OpenAI – the firm that created the generative AI tool ChatGPT. GOV.UK chat is now underpinned by a different gen AI system: the Claude LLM from Anthropic.
The AI vendor agreed a memorandum of understanding with the government earlier this year that will see the two parties collaborate on exploring the potential of AI to improve UK public services. Anthropic was also recently received approval to appoint former prime minister Rishi Sunak to a role as senior advisor.
The new record of GOV.UK Chat’s operational details, reveals that “Anthropic’s Claude… models are not further distilled or trained in anyway by GOV.UK”.
“GOV.UK do not train this model,” the document adds. “It is a foundation model and we constantly monitor for updates or changes that may affect our system performance.”
‘Intuitive and easy to use’
The technology remains in private beta phase, the transparency record states that: “The tool is being tested in a limited fashion, up to a maximum of 2,000 users for a four-week test period. After this, access to the tool will be removed.”
These test users are accessing the AI service via the recently launched GOV.UK App – where it is intended that the tool will ultimately be fully embedded.
When citizens first use GOV.UK Chat, they are “guided through an onboarding flow that introduces the tool, explains that it is powered by AI and may occasionally produce inaccurate responses”.
“GOV.UK Chat is designed to be intuitive and easy to use, requiring no specialist skills,” the record adds.
The new document setting out operational details, published via, claims that GOV.UK Chat will deliver significant benefits for citizens.
“GOV.UK Chat will reduce barriers to accessing government services by providing 24/7 assistance in natural language,” it says. “By making information more accessible and understandable, it will help more citizens engage with digital government services, supporting the wider digital transformation agenda and reducing reliance on traditional channels. GOV.UK analysis work and user research from the private beta showed that GOV.UK Chat was faster and easier to use than solely browsing GOV.UK to find an answer, and is comparable to using GOV.UK Search. Users also found that because GOV.UK Chat can pull information from multiple pages at once, it was better starting point to further explore a topic area, than searching or browsing GOV.UK.”
The algorithmic record reasserts that “no decisions are made or assisted by the tool, [which] provides summaries of GOV.UK guidance only”.

