The government’s flagship GOV.UK site was affected by a global distributed denial of service attack on a third party company that also took out Twitter, Spotify and other sites last month.
The attack, which took place on Friday 21 October, caused widespread issues for sites using domain name services (DNS) – which browsers use to identify the IP address of a website – provided by company Dyn.
The outage affected the GOV.UK websites, government blogs and services as well as some third party systems that GOV.UK uses, with the sites being out of action for around three hours..
According to a blogpost from the Government Digital Service’s Dafydd Vaughan, the on-call technicians tried to restore the service by removing Dyn from the infrastructure for the domains with the most users – www.gov.uk and www.service.gov.uk.
However, service restoration was delayed because the external organisation that runs the domain names couldn’t be sure that the change requests were authorised due to a number of internal systems being down.
Vaughan said that the incident showed that GDS’ DNS provision was a single point of failure, and that the team had now added a second DNS service to address this.
In addition, he said that a number of the monitoring services GDS uses were affected by the outage, which meant GDS wasn’t alerted properly.
Meanwhile, the usual communications methods GDS uses to let people know about service issues, such as Twitter, were unavailable due to the attack on Dyn.
Vaughan said that GDS was looking into ways to improve its alert processes and other methods of providing people with information on service outages.
A second, related incident, which occurred on Wednesday 26 October, is also discussed in the blogpost.
This saw GOV.UK go down for 25 minutes, with some websites and services unavailable to users.
GDS said this was due to a planned change to the DNS record for the www.gov.uk domain name – requested after the incident the week before in an effort to “restore some resilience”.
However, an engineer at the external organisation made an error when changing the DNS record, which resulted in its pointing to something that didn’t exist. The external organisation repaired the error once it was noticed.
Vaughan said that the team had not yet finished reviewing the incident or the way it was handled, but that due to the scale of the issues, it had decided to publish an interim report.
“We’re also working with our colleagues at the National Cyber Security Centre and other parts of government to coordinate our incident management processes and understand the wider government impact,” he said.