Whitehall needs to make better use of machine learning to make sense of the deluge of information generated by greater online participation, the digital strategy manager at the Department of Health has said.
‘Digital avalanche’ of data from online consultation and petitions needs to be tackled with algorithms – Photo credit: Pixabay
“The rise of digital social movements…has led more people to use digital tools to connect with government more quickly and in ever greater numbers,” Laurence Erikson wrote in a blogpost.
He said that, although greater participation should be “good news for government”, every webchat, online consultation form and petition promoted on social media and over email, “there’s someone on the other end who needs to read and make sense” of the inputs.
Describing this as a “digital avalanche”, Erikson argued that they could be a useful way of making better use of people’s opinions, assuming that the civil service has the “methods and the tools to cope”.
One answer, he said, may lie in machine learning algorithms that can be trained to spot patterns from data – and so used on large datasets to analyse unstructured text.
Changing the culture: What government must do to make the most of AI
Google’s DeepMind and NHS restart data-sharing deal with greater transparency
GDS expands data science training programme for civil servants
Erikson said that DH had been trialling a learning algorithm tool to help analyse public consultation responses, starting by carrying out a controlled trial on a consultation with a lot of digital responses, which the team compared with the manual analysis.
“The findings so far are intriguing,” Erikson said. “We found that the machine learning approach reinforced some of the findings of the manual approach, but also identified new insights from the consultation responses.”
These included being able to look at the issues that were consistently raised across different questions, and being able to better understand what different groupings of audiences said with the algorithm approach.
The value of these insights increased as more data was fed in, he added. However, he noted that this still required “lots of human input” and still took time.
Erikson’s comments and his department’s trial of machine learning to analyse consultation responses echo moves made by the Government Digital Service’s GOV.UK team to increase use of machine learning to classify user comments.
Last month, Matt Upson said that the GOV.UK team had automated some of the process for analysing the responses to one of its user surveys, which asks people about their journey through GOV.UK and gets around 3,000 responses a month.
“Until now, a group of volunteers from various GOV.UK teams have manually classified the responses into a number of pre-defined classes,” Upson said. It takes volunteers around 45 minutes to assign classes – such as ‘ok’, ‘check-status’ and ‘compliment’ – to the free text sections for 100 surveys.
This, he noted, would be a “significant investment of time” if the team was to classify all 3,000 each month. He said around 4,000 surveys have been manually classified to date.
However, because algorithms need a lot of pre-classified samples to begin with, and there is a lot of variation in the free text people use, “it could take us several years before we have sufficient data at the current rate our volunteers are able to classify them”.
Instead, the team used the machine learning process to identify the largest class – ‘ok’ – by training it to base its assessment on a combination of features in the free text.
“After generating newer, simple features and curating a more objective dataset, our machine learning algorithm was able to identify ‘ok’ responses 88% of the time,” Upson said. “This saves a huge amount of time for our volunteer classifiers, although there is still room for improvement.”
In the cases where surveys that should have been classified in that way “slipped through the net”, it was correctly classified by the volunteer coders and fed back into the algorithm to improve its accuracy.
Upson said that it would probably be impossible to entirely automate the process, but that this wasn’t the end-goal because the volunteers come from different teams, which makes sure the feedback is returned directly to the teams working on the relevant parts of GOV.UK.