'If none of our ideas fail, we are not being ambitious enough' - inside government's data science revolution
Tom Smith, managing director of the ONS Data Science Campus, tells PublicTechnology how and why he wants to 'move the needle' for the use of data across the public sector
Credit: ONS/Open Government Licence 3.0
On the website of Office for National Statistics’ Data Science Campus, the campus’s managing director Tom Smith is described as a “lifelong data addict”.
“Tom has more than 20 years’ experience using data,” it adds.
Elsewhere on the web, the 138 characters of Smith’s Twitter profile contain no fewer than five instances of the word ‘data’ – not including the one that forms part of his @_datasmith handle. Nor the one that features prominently on the t-shirt he is wearing in his profile picture.
All things considered, Smith is probably as well as placed as anyone to answer PublicTechnology’s – rather facile – opening question: what is data science?
“One of the classic definitions is around gaining value from data,” he tells us. “It moves away from publishing data and statistics outputs that you then let sit on a shelf as big, fat reports. It is very much around trying to solve some problems.”
Solving problems is a key part of the remit of the Data Science Campus, which was established at ONS headquarters in Newport 18 months ago. Its brief can, essentially, be split into two tracks: supporting the statistical work of the ONS; and helping the wider public sector overcome challenges.
I would expect a small number of our projects to land well, and have a big impact, lots of them to have a use outside [of the campus], and some of them to fail – if none of them fail, then we are not being ambitious enough
In the case of the former, Smith says: “We have a remit to be more at the research and prototype end of the spectrum, rather than deployment and production. So, our task is to explore how a new type of data, or new data source, or a new tool could add to the official statistics that ONS produces.”
The ONS is able to get its hands on an increased range of data since the Digital Economy Act [DEA] was passed into law in April 2017 – one month after the campus opened. The new legislation makes provisions to ease the sharing of data between government organisations – something that has given ONS data scientists access to previously untapped mines of useful information from public-sector agencies.
"We wanted to look at… testing new data sources for understanding the economy,” Smith says. “One of those is movement of goods [and] there are loads of potential data sources you could use. One of the areas is shipping – as an island nation, a huge amount of our goods are brought in and out by sea. So, if we have a good understanding of sea routes and shipping movements, that can help us with understanding trade – but also help us understand pressures on ports and capacity impacts, and what is likely to drive delays.”
‘We are part of the ONS – but different, and outward-facing’
The creation of a dedicated facility for data science was, according to Smith, fuelled by increasing cognisance of the power of data among politicians and senior Whitehall officials.
“Government is very aware – right to the top of the shop – that there is huge value in data, and there is huge value in using data better for services,” he says. “There is a great recognition that this is an area of huge importance, and I think (civil service chief executive) John Manzoni’s quote ‘getting data right is the next phase of public-service reform’ underlines how critical this is.”
The establishment of the campus was enabled not just by high-level buy-in, but by developments in recent years in three key technical areas – the first being data itself.
“We have always had data, but what we have now is lots of data that has been labelled, and that you can use to train models, or test models,” Smith says. “And we have lots of data from different sources.”
The second technical factor is a rapid rise in computing’s processing power that means “every analyst in our team has access to terabytes of RAM and hundreds of processors to run analysis on”.
The third aspect picked out by Smith is the increased availability of open-source tools from even the world’s biggest software companies.
“It is still astonishing to me that commercial organisations such as Google publish openly their machine-learning tools, like TensorFlow,” he says. “If you went back 20 years and said ‘Microsoft is going to invest heavily in GitHub’ – essentially a code-sharing and repository tool that is very open – that would have been huge mind-shift for the time.”
Smith says there was “a lot of debate” about the relative merits of how best to establish a function such as the campus. The decision was reached that this was best achieved by remaining embedded within ONS – but creating and maintaining a distinct brand.
"We are very much part of ONS, so we can work with ONS teams, work with ONS data, feed value back in and help support the ONS… but also we have this cross-government supporting role as well,” Smith says. “It was an explicit choice to… [adopt] separate branding to emphasise that this was part of ONS, but different and outward-facing.”
He adds: “There is a data set called the automated identification system (AIS) which is a global shipping system which has minute-by-minute GPS traces from every ship above a certain weight – pretty much every cargo ship. That is a hugely interesting source, which we were provided with by the Maritime and Coastguard Agency (MCA). Now DEA is up and running… [they know] there is a permission and a right that they can use to share it with ONS for the purposes of research and statistics.”
The MCA also provided data from its Consolidated European Reporting System information management platform. The overall haul of data sets provided to the campus was, Smith says, “the biggest we have loaded onto our system”, and required a significant investment in resources.
The monthly national trade figures produced by the ONS are published two months in arrears, and are based on data provided by HM Revenue and Customs. While the statistics on shipping movements are not likely to supplant HMRC as the primary source for these economic reports any time soon, data scientists at the campus have concluded that the MCA information – which is provided in real time – could offer a valuable early insight into trade trends.
A 50-page report of the findings from analysing the shipping data was published in June, and the results of this work are an exemplar of the way the campus wants to work, and the impact it can have, Smith says.
“The report, the methods, and the code are all open for reuse by analysts in government and elsewhere,” he says. “It hits all of our bases as [it is about use of] data, focused on an ONS problem, with wider use by the public sector."
Data scientists at the campus have also been examining monthly VAT returns and aggregated financial data from companies, with the goal of providing more timely indications of GDP trends.
Smith says: “We asked the question: can we produce a rapid headline indicator of GDP movements – not trying to estimate GDP itself, but GDP movements – based on company-level data? And, the short answer is: you can. It is very straightforward to produce, and it is essentially a measure of change in turnover – or even profits – from companies. It is noisy as hell, because it is a real data source. But it is a rapid heads-up.”
For the time being, Smith believes that work such as the analysis of the shipping and VAT data “is probably not strong to publish” as a stand-alone economic indicator.
“But as one of a basket of indicators, they could be very interesting,” he says. “This is an example of how we have linked into the formal, mainstream work of ONS. We are always looking to ask the question: ‘and what else?’. We are not looking to be working with teams on improving stuff they are already doing. But we might be looking… to add things – [such as] new data sets, and new tools.”
The role of the private sector
Earlier this year, the ONS announced its intention to establish an £8m dynamic purchasing system for new sources of aggregate data. The announcement came with an appeal for any public- or private-sector organisations that might hold such information – and are interested in working with the ONS – to get in touch.
The ONS does not pay for data – the £8m value of the DPS relates to the cost of extracting, processing, and formatting data – but Smith says information provided by commercial organisations has a valuable contribution to make in the work of the campus.
“There is a big role for private sector here,” he says. “In many areas, private sector is a data-holder on aspects of the world that we would love to produce statistics on – that businesses, as well as public sector can then reuse. Things like the census obviously provide information that is used by every single part of government – but also huge numbers of businesses as well, when they are deciding to invest in new operations or markets. So, there is a win-win.”
One company already working with the ONS is Barclays. Two analysts from Smith’s team are currently on secondment to the bank and are examining its aggregated data on customers' accounts to ascertain whether this could provide an early indicator of consumer spending trends.
“Our interest and our intention there is to add something to what we know about the economy,” he says.
“In many cases, industry knows about or has data on things that, in the public sector, we don’t know about, or we don’t have as much information on, or as fine-grained or timely information.”
Smith would like to see more engagements with commercial entities, and the campus is currently in discussions with some big retailers about the possibility of obtaining and analysing aggregate data from point-of-sale scanners. This information could potentially provide a much more timely reading on economic trends than the monthly or quarterly reports that those companies may currently be providing.
The mid- to longer-term goal is that some of the new data sources and tools whose use is being pioneered by the campus will become part of the ONS’s core product set or augment its wider research operations.
“The intention is very much that we do have an impact on mainstream ONS work, and mainstream work in government. That is certainly where we are aimed at,” Smith says.
He adds that the innovative nature of the work being undertaken by his team means that some ideas or programmes will, inevitably, fail.
He compares the ethos of the campus to the marketing maxim coined by 19th century US entrepreneur John Wanamaker: “Half the money I spend on advertising is wasted – the trouble is, I don’t know which half.”
“I know that some of [our projects] are going to land – I just don’t know which,” Smith says. “If 50% of them land, I will be very happy. I think I would expect something like a small number of ours to land well and have a big impact, lots of them to have a use outside [of the campus], and some of them to fail – if none of them fail, then we are not being ambitious enough.”
For the wider public sector – where data science may, typically, be less well understood – the campus is there to “help show people and demonstrate what is possible”.
Number of qualified government data scientists the campus intends to provide by March 2021
27 March 2017
Date on which the ONS Data Science Campus opened
Expected number of campus employees by the end of the FY19
Estimated value of dynamic purchasing system for new data sources launched by ONS earlier this year
This includes a training course called ‘The Art of the Possible’, in which campus staff travel to other departments and agencies and host a presentation outlining the potential of data science. This includes showcasing examples of data-science initiatives taking place across the breadth of the civil service.
“The course is aimed primarily at managers to help them understand what their teams could be [doing with data],” Smith says. “But the flipside is that there is a lot of good stuff already going on, and the examples in The Art of the Possible are drawn from every government department. It is not about ONS or the campus going out and saying ‘hey, we have this fantastic stuff’. It is often us saying: ‘look, this stuff is going on right across government’. The campus and ONS are part of that but, actually, there are some excellent examples across the piece.”
In addition to evangelising the potential benefits of using data, the campus will also support and collaborate with organisations – such as DWP or HMRC – that have existing in-house data-science units.
Another key strand of the campus’s mission to support the public sector is its work to train up an additional 500 data scientists across government by March 2021 – including 150 that will obtain qualifications in the current financial year.
The total of 500 will include apprentices, students undertaking a campus-sponsored MSc – either full-time or as part of on-the-job training – existing government analysts being trained in data-science techniques, and graduates of the government’s Data Science Accelerator programme, which is co-run by the ONS.
Providing government with so many newly qualified data experts could prove extremely valuable in a world where large organisations are “competing globally for talent”, Smith believes.
"Every organisation is paying big bucks for data scientists, and government can’t afford to compete with Amazon and Apple and the rest of those [in terms] of salaries. “But, we have a great story around the work you can do, and the public good, and the types of analysis and data you can use in your day-to-day job. There’s really strong messages there.”
A broad mix of people
The campus currently has almost 50 employees on the books and plans to grow to a total of as many as 70 by the end of the financial year.
Smith estimates that the organisation’s existing staff are evenly drawn from three areas: the wider civil service; academia; and the commercial sector. Each group brings with them distinct qualities, he says.
Those arriving from the private sector tend to be well accustomed to a “rapid cycle of work that is geared very much towards delivering something of value to users”, while career civil servants arrive with existing relationships with other branches of government, and an understanding of the challenges facing the public sector.
Academics, meanwhile, bring “huge technical skill sets and understanding” of the tools being used at the campus. In return, working at the ONS can provide a rewarding work environment, according to Smith – who worked in research at the University of Oxford from 1994 to 2003, during which time he also completed a PhD at the University of Sussex.
“One of the dangers of machine learning is you pull a tool off the shelf, use it, don’t really understand what it is doing and publish,” he says. “So, you need to have that academic understanding, that in-depth understanding of what is going on, and what your outputs and your systems are doing.”
Smith adds: “What academics want is applied areas they can use their skills on – and that is what we are offering. We have data and tools, and challenges to apply them to, and you can come in and do some very interesting work around social good and public good, and apply your skill set in a way that is perhaps is difficult to do within the existing university system.”
When asked whether the campus has achieved as much as he hoped during its first year and a half in existence, Smith singles out its work to grow skills and capability – particularly its collaborations with universities – as one area where his team has “over-delivered”.
The projects the campus has undertaken to identify and utilise new data sets or methods have also gone “extremely well” so far, he says.
We are very much part of ONS, so we can work with ONS teams, work with ONS data, feed value back in and help support the ONS… but also we have this cross-government supporting role as well. It was an explicit choice to… [adopt] separate branding to emphasise that this was part of ONS, but different and outward-facing
Smith adds: “But what I would love to see next for the campus, and our big challenge now, is to land one or two projects that make the move up the pipeline, so that they are deployed in anger in day-to-day work by other parts of government or the public sector, [in] other statistics or operations. We haven’t quite landed that – but we’re close. So, that is the one thing that is missing.”
Of the projects currently ongoing, Smith says he has a good idea of which he believes are most likely take off and go mainstream – but declines to be drawn further.
For the year ahead, Smith says that delivering the big-impact project he is hoping for is his single biggest ambition for the campus.
“A single exemplar project that had landed right through to being used in day-to-day outputs by ONS or other government agencies would be the huge win that we are pushing for,” he says. “If I had that, I would be happy – and that is ambitious.”
Smith adds: “It is possible, and realistic – but it is challenging. It wouldn’t be a failure if we didn’t – I think it would be a great success if we did. All the other stuff around exploration and building capability and so on is ongoing: it is tough, and challenging, and exciting – but we will make those. The big win would be having something that moves the needle.”
To conclude our discussion, PublicTechnology asks Smith (pictured left) how he would like to see the data-science landscape develop and grow over the coming decade.
“What I would love to see is us being able to harness data-science skills from across commercial, public, and the academic sectors… [for] public good,” he says, before pointing to the examples of social enterprises and charities such as Police:Now and Teach First, both of which work to attract talented people – often graduates – into public-service careers.
“As a data scientist, you are going to have a great career, whatever sector you go into,” Smith adds. “But come work for a year or two years on public sector challenges, help change the world – and then take those experiences onto wherever your career takes you. The opportunity here is vast, and it is not going to go away.”
In other words: we may see a lot more people yet succumbing to a life of data addiction.
PublicTechnology editor Sam Trendall picks out the topics and trends that will dominate the year ahead, and revisits the predictions of a year ago to see any of them came to pass
Select committee concludes 18-month inquiry with damning words for Mark Zuckerberg
The head of the government statistical service looks back on 2018
Ian Campbell discusses the Industrial Strategy and the opportunities ahead for UK Research and Innovation