Q&A: Tom Smith, MD at the Office for National Statistics Data Science Campus
Earlier this year the Office for National Statistics launched its Data Science Campus, promising to deliver ‘data science for public good’. Rebecca Hill spoke to its managing director Tom Smith to find out what that means in practice.
Tom Smith: managing director of the Office for National Statistics' Data Science Campus
This is your first role in government. Why did it appeal to you?
I've always worked in the social mission space - in an academic setting and as the co-founder of a commercial organisation - but this is the first time I've been attracted by a role in government.
I'm a physicist by training, and have spent the last 20 years with two real themes to my work. One is around understanding the world through data - in the early ‘90s I was developing measures to understand socio-economic issues like deprivation.
The other is using new techniques, so my PhD was on artificial intelligence and machine learning; essentially designing robot brains using what are now described as neural networks.
The Data Science Campus brings those two areas together - asking what techniques we can use on the massive amount of information that's coursing around different systems to help better understand economy and society and improve services.
What are the main aims of the Data Science Campus?
The first is to help the ONS step up the way it supports decision-makers. It’s about understanding what people use our data for and how we can improve and strengthen our outputs. That's not started with the campus - it's an ongoing transformation at the ONS - but the campus has a real role to play.
The second is helping teams in the ONS develop their data science skills and capacity. For me, data science is a skill set to understand the digital world, bringing in statistics, models and reliability, as well as the ability to work with large datasets, for instance with software development or coding. It also brings in the economics and population migration expertise across the ONS.
The third is showcasing the sort of thing that can be done with data. For instance, I’m interested in the extent to which you can use satellite mobile phone data to understand the economy in real-time - for instance with population movement - to add to the ONS’s existing GDP measures.
Could you give me an example of the Data Science Campus' work?
A nice example, that covers all the bases, is our work with the ONS natural capital accounts team and the Department for the Environment, Food and Rural Affairs.
They’ve asked us to explore how we can get an indicator of the amount of greenness in cities - so it feeds into a problem that a government team is interested in and will improve understanding of local environments. And for us, it allows us to use the sorts of data that ONS hasn’t got or hasn’t looked at.
Our starting point was street level imagery from Google Street View. We then ran neural network analysis against those images to classify them, and give an estimate of the coverage. We’ll produce a data source across the UK that you can compare at a granular level.
Why has the Data Science Campus been set up at the ONS, not across Whitehall?
I don't see them as being different. We work very much in collaboration through the Government Data science Partnership, which is led by the Government Digital Service, the Government Office for Science and the ONS. We very much have a remit to support good data scientists in other organisations and departments.
But we’re at the ONS because we're an outcome from last year’s Bean review, which said the ONS needs to step up in [the skills] space. That's why one of our three major areas of work is about developing ONS skills.
Why is data science literacy so important to government?
I think it's because we're seeing an increasing recognition of the value you can get from asking questions of data. I think that the slightly generalised view has been that people have, in the past, seen the answers in data. The idea that the ONS publishes a statistic, someone sees it and thinks ‘Aha!’ and either uses it or doesn’t.
Whereas a much more sophisticated, productive and fruitful relationship is to ask questions of that information. That's fundamentally what data science is about. Lots of statisticians, analysts and university academics are already doing this kind of thing - but now the view has shifted into the organisational space; so governments are asking questions of what data science can do.
What was it like setting up the Data Science Campus?
Creating a new group in a large organisation is a very interesting challenge - as I’m sure a lot of people can imagine! There are two things that have worked: the first is that we brought in someone who has been at the ONS for a number of years to be essentially our chief of staff. That gives the team that support back into the organisation, and has effectively been a spirit guide to the ONS. That’s been invaluable.
The second is that we’re a recognisable group, with a lot of additional skills in data science, and because we’re working with ONS teams on their challenges, people can see we’re providing value - rather than this thing that’s been bolted on the outside.
Imagine it’s this time next year - where would you like to be?
Our ambition is that we’re seen by the rest of government as a hub of support for data science, with the ONS providing help to departments and agencies.
Similarly, to see that the campus has helped ONS teams to develop and strengthen their own skills in data science, and as a result have improved their own delivery.
Thirdly, I think there’s a role for the ONS to be a leader in international data science support for government - not just a leader in international statistics agencies.
Fourth - and this would be the icing - is that some of our work is going into published work; that it’s feeding into, supporting or strengthening ONS outputs, for instance measures of the economy or the Sustainable Development Goals.
Do you feel a responsibility to ensure that data is used properly, and to instil a sense of realism about what it can and can’t do?
This is a huge challenge. And I think it's one that the ONS as a whole grapples with, and has done for a long time - making sure that the analysis you carry out is robust and reliable; that the outputs are understood, and measure what you're trying to measure; and that you're not overstating the accuracy and reliability. All of those are the same questions.
From our side, we have a role to say, ‘What else?’. To ask if there’s more we can do with this data, if there are other data sources we can use to get a quicker, smaller area breakdown.
And what about public trust in government use of data - how can we get that right?
[In this discussion], I think we need to understand that the government, the ONS, and the general public, is keen to make the best use of the data available for better decisions and better services.
There is huge value in the data that’s held by government. The VAT data held by HMRC from companies can be used to give estimates for local turnover by business sector, which is massively interesting if you’re looking at industrial strategies.
So, if we can use data that's held or collected by government departments to better understand local economies, so that investment and support can be given by government or used by businesses to make decisions, we’d be failing in duties if we didn't do it.
This interview was carried out before purdah.
Report reveals that information has been made publicly available online via an information-sharing tool widely used by government developers
Public Accounts Committee follows the NAO in voicing 'serious concerns' about reform programme
Department works with GDS to create and implement a consistent style for all content
Department to take three-month break from ‘proactive data sharing’ with other government agencies, as well as restricting data shared with financial institutions
The cautionary tale of the Leicestershire teenager who hacked high-ranking officials of NATO allies shows the need for improved password security
Calm has turned a section of the 57,509-word EU document into a sleep-inducing audio book