Carry on campus – how ONS data science is supporting Brexit negotiations and coronavirus response
Since its launch in 2017, the ONS Data Science Campus has delivered many innovative projects to support the government’s work in areas ranging from Brexit negotiations to the early stages of the coronavirus response. We talk to managing director Tom Smith to find out more
The Data Science Campus has been part of the Newport headquarters of the Office for National Statistics for the last three years. Established to test the use of innovative analytical techniques on a wide range of – often previously unexplored – data sources, the campus has supported the production of official statistics in major policy areas including transport, the environment, and the economy.
Its ability to respond rapidly with data insights related to urgent government business was even called upon to help inform the initial response to the Covid-19 coronavirus outbreak.
In light of many such examples of demonstrating its worth, financial backing for the work of the campus is now ring-fenced from within the ONS’s core funding; it was previously supported by the money provided in the Treasury’s initial upfront commitment in 2017.
Tom Smith, the managing director of the campus, welcomes the recognition, and the stability.
But he stresses the need for his 70-strong team to maintain a separate and distinct identity and, most importantly, a different way of doing things.
"Government is too big, and this is too important an area, to have a single centre that tries to run everything. Our role will be as a centre of excellence that can share knowledge and support capacity building, where needed."
“We’ve come a long way,” he says. “I think we’re better embedded, and certainly we have proved the concept. But I wouldn’t say we’re part of the furniture. And, if we ever are, then it’s probably time to shut down.”
According to Smith, the campus thrives on the “tension and challenge” of ensuring its work supports that of the ONS more broadly and of government as a whole – “but without getting sucked into the business as usual” of surveys and statistics.
“You need a space where you can explore new things, where you can try things and fail, cut some of them off, but then come back and introduce some of those innovations back into the wider business as part of its work – but where you can keep being flexible and agile,” he says.
In recent years, many in the business world – particularly digital and data start-ups – have adopted the mantra of “fail fast, fail often”. Government, however, is not known for showing a high tolerance for failure.
“Large parts of government deliver services… where there is very little room for error and failure – and rightly so. So, the role for innovation there is probably less,” Smith says. “In other areas, you’ve got scope that is built in [for] failure and innovation.”
Economy of expression
A good example of where such scope has enabled data scientists to flourish is the campus’s work to provide quicker insights into the performance of the economy.
The ONS publishes quarterly estimates on the UK’s gross domestic product. But this information is not released until about six weeks after the end of each three-month period and, even then, the data is open for revision for up to a year afterwards.
During its Faster Indicators of UK Economic Activity project, the campus has sought out and used various novel data sources to try and provide more timely insights into the UK’s economic ups and downs.
Data scientists began by undertaking an “exploratory” project examining figures on the movements of cargo ships into and out of UK ports.
Such work is done with an open mind, Smith (pictured left) says, and an understanding that analysing the data at hand may tell you something unexpected – if it tells you anything at all.
“You start out with a data source that looks interesting – you explore it, and see what it can tell you about three possible use cases, for example,” he adds. “It may be that it can’t really tell you anything significant about any of those, but then something else pops up while you’re doing that exploration.
“So, you have to be open to the exploratory and research part of the work and say ‘we can flex on that’. We’re not going to set out a three-year programme in detail with milestones [as to] what each thing will tell us and when. Because, actually, we don’t know at the start.”
The analysis of shipping data grew from these exploratory origins to become first a monthly and then, at the request of the Bank of England, a weekly report on the arrival, berthing, and departure of ships from all UK ports. It has become the ONS’s only regular weekly report and, according to Smith, the first instance of an official statistics organisation publishing info based on “big data sources” – rather than traditional information-gathering methods, such as surveys.
The work has now been transferred to ONS economic statisticians and is providing valuable information for trade officials and monetary-policy professionals.
“It’s an indicator of trade-in-goods activity… [that] appears about a month before the GDP measures are published,” Smith says. “It’s the sort of thing that the Bank of England will take into account in their decisions on interest rates; they look at a basket of indicators, and trade is one of those areas they’re interested in. It’s also the sort of thing that might give you an early warning system if there are likely to be big changes or shifts in trading patterns. For example, what might happen when we leave the EU.”
The Faster Indicators programme as a whole – which also includes monthly data outputs on road traffic patterns and company VAT returns – has moved beyond the walls of the campus to become part of the ONS’s mainstream work, Smith says.
It is one of three examples that best demonstrate how the facility’s freedom to explore novel ideas – and its ability to deliver rapid insights – can have a big impact, he adds.
Mind your language
The development of natural language processing (NLP) tools for potential use across government is the second landmark piece of work cited by Smith.
NLP is a form of artificial intelligence that enables a machine learning-powered computer program to read and identify words and derive meaning from the linguistic patterns.
Smith’s team used a mix of off-the-shelf products and tools created in house to develop NLP techniques that are being used by policymaking and negotiating teams focused on the UK’s Brexit trade agreements.
Following the country’s formal departure from the European Union on 31 January, the UK has a year to develop and implement a set of independent trade tariffs – the first time it has needed to do so in almost half a century.
During a public consultation that ended on 5 March, the Department for International Trade sought input from businesses and experts across all industries on the role and importance of tariffs in their sector. Given the vastness of the consultation’s remit and the huge level of public interest, the number of respondents is expected to reach well into the hundreds of thousands.
To help analysts pick out predominant themes as quickly as possible from such an enormous data set – almost exclusively comprising free text – NLP technologies will be used to help process responses and guide officials towards key focus areas.
“We’ve been using techniques that can pull out themes, trends, and patterns in responses, and then look for the sentiment on each of those trends and themes, and use that to give intelligence, information and insight to both the analysis teams to dig further, but also the trade negotiating teams, to inform what they’re looking at,” Smith says.
The tools created by the campus can be reused and applied to similar challenges across government. Other existing uses of the NLP system include its application to patent data to support the work of policy teams focused on delivering the Industrial Strategy. The goal is to “identify where the UK has strengths or, perhaps, lags behind”.
Smith says: “We’ve been working with the Intellectual Property Office; using the global patent database of 19 million patents and using the same [NLP] tools and techniques to pull out trends, emerging themes, and areas of interest, and then cross-reference them against which organisations are registering those patents, and in which countries. And that gives you an insight for policymaking around how the Industrial Strategy works to support particular areas of UK industry.”
He adds: “You could, in principle, also look down to regions and look at what’s coming out of high-tech areas like Oxford, Cambridge and ‘Silicon Fen’… and see how that compares with some of the other industrial cities, for example.”
The third exemplar project the campus chief flags is one that demonstrates its ability to help government respond to urgent problems.
As part of the initial response to the outbreak of coronavirus, the Department for International Development asked the campus to explore how travel data could provide an indicator of which countries might require most support in dealing with the disease.
“We were asked to look at what we could do in terms of [examining data on] flights and transport links from Wuhan and other areas of China into particular countries that might have poorer-resourced health services,” Smith says. “If you can identify countries at risk, and then cross-reference that against areas that may not have health-screening facilities in place, then you’ve got a list that DfID may want to step up support for.”
Calling on a range of data sources from government and industry, the campus pulled together flight-schedule data, anonymised itineraries, and information from the automated systems that track planes’ movements.
“Within a matter of hours, on a Friday night, we were able to turn around a set of analyses and maps and estimates to feed into decision-making by DfID and others… as to where the risk spots might be, and then cross-reference that against the known incidences of coronavirus outbreak at the time,” Smith says.
This rapid-response approach is “a long way from the traditional statistics model”, Smith says.
And yet, given that the prime minister’s leading adviser has loudly called for far greater use of data science and high-risk research projects across government, it is an approach that may become more prevalent.
Dominic Cummings may have called for a legion of “weirdos and misfits” to descend on Whitehall with innovative ideas and new ways of doing things, but, according to Smith, “you don’t have to dig very deeply before you find that government has actually got a lot” of the people Cummings is looking for.
The campus, alongside the Government Digital Service, is nearing the end of a cross-government data science audit commissioned by the Treasury. The exercise has identified about 800 civil servants working in the field.
“Most departments have a pretty strong cadre of groups who might describe themselves as data scientists, or have the skill sets of data scientists – even if their job title doesn’t say so,” Smith says. “There are people with PhDs in quantum physics, mathematical modelling, AI, robotics, computational neuroscience – all of those sorts of things abound.
“There is this wealth of skill… but we don’t necessarily talk about all of the examples that are being worked on.”
Around one in 10 of the data scientists identified by the audit are part of the campus team, including 70 based at its Newport HQ and another 12 that form the Data Science and AI for International Development offshoot unit embedded in DfID’s office in East Kilbride.
One of the founding objectives of the campus was to ensure that 500 civil servants were trained in data science skills and techniques. This will be achieved by the target date of April 2021, Smith says, via a mixture of sponsored master’s degrees, professional development courses, graduate training programmes, and recruitment.
The role of the campus will primarily be to support the continued growth of these skills, he adds, rather than acting as a central hub from which data science work is coordinated.
"I think we’re better embedded, and certainly we have proved the concept. But I wouldn’t say we’re part of the furniture. And, if we ever are, then it’s probably time to shut down."
“Government is too big, and this is too important an area, to have a single centre that tries to run everything,” Smith says. “Our role will be as a centre of excellence that can share knowledge and support capacity building, where needed; lots of departments have got great capacity and don’t need much help – but perhaps we can learn a lot from each other. I think supporting [data science] – rather than trying to own it – is a much stronger model for government.”
He adds: “I think for us the next year is about embedding and continuing as is. In the run-up to the Spending Review, we will be asking: where are the areas that we can have the greatest impact?”
Outside of work, Smith sticks with the theme of inculcating computing and analytical expertise.
“At the moment, my main fascination is around how to build capability in maths and computer science in an 11-year-old and a seven-year-old,” he says with a smile. “I’m doing lots of Raspberry Pi programming and sitting down with my two girls and working through their maths tests.”
After PublicTechnology wonders whether – despite Smith’s best efforts – the apples still might fall further from the tree than he would like, he reassures us that his daughters are making good progress.
As it passes its third birthday, the same can surely be said of the Data Science Campus.
Just so long as it doesn’t become part of the furniture.
Candidates sought to replace Sir Ian Diamond
Post comes with £70,000-plus salary and responsibility for data protection
Commissioner claims that fining public bodies simply creates a ‘money-go-round’
Annual report of tech agency – which will shortly be wholly merged into NHS England – details progress in many areas