GDS pledges to learn from coding gaffe that caused GOV.UK search issues
Organisation promises ‘multiple steps’ to ensure mistakes are not repeated
The Government Digital Service has indicated that it is taking “multiple steps” to make sure it avoids repeating the coding mistakes that led to the GOV.UK search engine delivering duplicate and missing results.
The GOV.UK search function runs on open-source search technology Elasticsearch, which GDS has customised with an “API wrapper” called Rummager. On 18 May 2017, GDS enacted “what should have been a simple code change to Rummager, to make it compatible” with the upgraded version of the search engine, dubbed Elasticsearch 2.4.
But the code change “unexpectedly caused problems for a small percentage of users on GOV.UK”, GDS said. These problems involved some specialist pages failing to appear in the dedicated “finder” pages for that topic or area, and some results appearing in duplicate.
These issues were spotted within 24 hours, GDS said, and the offending code change – which amounted to no more than three lines – was rolled back. However, about 3,000 GOV.UK pages were updated by content teams while the faulty code was live.
- GDS pledges to address GOV.UK search and navigation issues
- GDS to cut GOV.UK frontend templates from 140 to 10
- GDS to audit all GOV.UK pages by 2020
“We decided against using our nightly backup to restore GOV.UK to a previous time, because otherwise these content updates from publishers would have been wiped out from search,” GDS said. “It would have been harder to repopulate the missing data within search than to correct the corrupt data, because this would have involved resending data from all of the GOV.UK publishing apps to Rummager.”
It added: “After rolling back the faulty code, our developers fully investigated and fixed the search documents that had been updated while that code was live. During this period of investigation, the quality of search results was affected.”
This was caused, GDS explained, by the disabling of a function that ranks pages that generate a greater volume of traffic higher in search results.
GDS ultimately discovered that the problems were caused by the “_id” and “_type” values of certain pages. These were values that were assigned to pages by Rummager, but which Elasticsearch 2.4 were unable to read as identifiers.
“We hadn’t updated some parts of the Rummager code that depended on those redundant fields, and this is what caused the duplication and missing search errors,” GDS said. “We didn’t catch this during code review, and our automated tests didn’t catch it either. Part of the problem was that documents needed to be indexed twice by Rummager before the problem was visible to users.”
Having replaced the errant code and, where necessary, written new code to ensure unnecessary duplicate documents were deleted, GDS said it has learned from the experience.
“This incident taught us that you need to understand the core concepts and constraints of your database and always check your assumptions,” it said. “There are multiple steps we are taking to make sure this doesn’t happen again."
Office for National Statistics to spearhead new civil-service profession
Mapping agency seeks input for three-month project to define long-term data strategy, including necessary skills and technology infrastructure
John Manzoni tells civil servants that use of RPA could create more time for work with customers
Home Office issues tender for £2m project
BT's Mike Pannell on the different ways of anonymising information and their application to IoT data
BT's Malcolm Stokes explains how organisations can attribute accurate figures to cyber risks in order to make a viable business case.
BT's Ben Azvine argues that the frequency and impact of breaches is increasing and we need to continuously adapt and innovate to stay ahead of the threat environment
BT has a team of over 2,500 security experts working to maintain the highest standards. Here we meet some of them and find out what they do.