Unstructured data from health forums is an ‘untapped’ resource for policymakers

Pilot study suggests qualitative data from online forums could improve healthcare professionals’ understanding of patient needs

Data from online forums could be used by providers to reassess their services – Photo credit: Flickr, Till Westermeyer, CC BY-SA 2.0

The wealth of unstructured, qualitative data available in online forums is an untapped resource that healthcare professionals and policymakers should use to their advantage, a report has said.

The think tank Demos and health charity The King’s Fund have published a joint report on how algorithms could be used to help analyse posts about mental health made on publicly available websites or forums.

Data from these forums offers a different perspective on health, the report said, as it gives an insight into the lived experience of those with mental health problems, as well as the advice and support given by their peers and assessments of interactions with health services.

The report said that the data could be analysed at scale by using natural language processing algorithms, which involves training computers to better process and manipulate human languages.

Related content

The King’s Fund calls for clarity on funding for fully-digital NHS
DH trials algorithm tool to analyse consultation responses in effort to handle ‘click democracy’
GDS expands data science training programme for civil servants

The pilot study used a modified web scraper on more than 1 million posts made between June 2004 and May 2016 on six online forums. The data was psuedonymised and then used to train natural language processing algorithms to understand how people discuss mental health online.

The software was tested on three questions, asking whether it could accurately identify: cries for help, where people wanted guidance from other users; discussions about cognitive behavioural therapy; and cases of co-morbidity, where a mental health problem coincided with long-term physical conditions.

The report said that the software had accuracy rates of around 65% for both cries for help and identifying posts about CBT, with this increasing to 72% accuracy in identifying posts where the person had had CBT. The team also claimed a 98% accuracy for the 50 posts they assessed for co-morbidity.

According to the authors, there is huge potential for analysis of publicly available data to be used to inform policymaking, for instance by offering health regulators more insight into the performance of providers and giving service providers themselves a better understanding of their users.

Josh Smith, a co-author on the report and researcher at Demos, said the study “highlights the potential for new technology and methodologies to provide a whole new perspective on mental health”.

However, the report also acknowledged the “significant technical, methodological and ethical challenges still to overcome”, including concerns that free text entered in online forums might include identifiable data, which would make it difficult to fully anonymise data.

The report stressed that the approach “is not and never will be a silver bullet”, saying that the data should only be seen as a complementary source of information.

The work received ethical approval from the University of Sussex Ethics Review Panel, but was not considered a clinical study as it did not recruit patients from the NHS and didn’t gather clinical data or make interventions that would affect anyone’s care.

The Department of Health recently revealed that it was looking into the use of algorithms to make sense of unstructured data, with digital strategy manager, Laurence Erikson, saying that it was using machine learning to help analyse responses to public consultations.

“The findings so far are intriguing,” Erikson said in a blogpost about the work, published in January. “We found that the machine learning approach reinforced some of the findings of the manual approach, but also identified new insights from the consultation responses.”

In February, MPs announced plans to investigate the use of algorithms in decision-making, to look at whether, and how, they can be used in a transparent or accountable way.


Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *

Thank you! Your subscription has been confirmed. You'll hear from us soon.
Subscribe to our newsletter