GDS pledges to learn from coding gaffe that caused GOV.UK search issues

Organisation promises ‘multiple steps’ to ensure mistakes are not repeated

The Government Digital Service has indicated that it is taking “multiple steps” to make sure it avoids repeating the coding mistakes that led to the GOV.UK search engine delivering duplicate and missing results.

The GOV.UK search function runs on open-source search technology Elasticsearch, which GDS has customised with an “API wrapper” called Rummager. On 18 May 2017, GDS enacted “what should have been a simple code change to Rummager, to make it compatible” with the upgraded version of the search engine, dubbed Elasticsearch 2.4.

But the code change “unexpectedly caused problems for a small percentage of users on GOV.UK”, GDS said. These problems involved some specialist pages failing to appear in the dedicated “finder” pages for that topic or area, and some results appearing in duplicate.

These issues were spotted within 24 hours, GDS said, and the offending code change – which amounted to no more than three lines – was rolled back. However, about 3,000 GOV.UK pages were updated by content teams while the faulty code was live.


Related content


“We decided against using our nightly backup to restore GOV.UK to a previous time, because otherwise these content updates from publishers would have been wiped out from search,” GDS said. “It would have been harder to repopulate the missing data within search than to correct the corrupt data, because this would have involved resending data from all of the GOV.UK publishing apps to Rummager.”

It added: “After rolling back the faulty code, our developers fully investigated and fixed the search documents that had been updated while that code was live. During this period of investigation, the quality of search results was affected.”

This was caused, GDS explained, by the disabling of a function that ranks pages that generate a greater volume of traffic higher in search results.
GDS ultimately discovered that the problems were caused by the “_id” and “_type” values of certain pages. These were values that were assigned to pages by Rummager, but which Elasticsearch 2.4 were unable to read as identifiers.

“We hadn’t updated some parts of the Rummager code that depended on those redundant fields, and this is what caused the duplication and missing search errors,” GDS said. “We didn’t catch this during code review, and our automated tests didn’t catch it either. Part of the problem was that documents needed to be indexed twice by Rummager before the problem was visible to users.”

Having replaced the errant code and, where necessary, written new code to ensure unnecessary duplicate documents were deleted, GDS said it has learned from the experience.

“This incident taught us that you need to understand the core concepts and constraints of your database and always check your assumptions,” it said. “There are multiple steps we are taking to make sure this doesn’t happen again.”

 

Sam Trendall

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *