The government-funded project to sequence the genomes of 100,000 people has chosen cloud computing company EMC store the massive amounts of data it creates.
A single genome generates hundreds of gigabytes of data – Photo credit: Flickr, Shaury Nash
The 100,000 Genomes Project, which is being carried out by the Department of Health body Genomics England, aims to sequence and collect the complete DNA sequences of 70,000 NHS cancer and rare disease patients, and their families.
The aim is to analyse these genetic sequences – which will be de-identified – to help develop new disease diagnostics and create personalised treatments, as well as to encourage innovation in the UK’s bioscience sector.
The sequencing of a single genome creates hundreds of gigabytes that need to be stored digitally. The project, which has sequenced 13,040 genomes to date, is expected to generate 10 times more data in the next two years.
Genomics England already uses EMC Isilon for the storage of its sequence library, and has today announced it will use an Isilon data lake to store all the data collected during the sequencing process securely for it to be analysed.
“There are few better examples of the fundamental impact that analysis of data sets can have on society,” said Ross Fraser, the vice-president and managing director of UK and Ireland at EMC.
“Delivering the platform for this large scale analytics in a hybrid cloud model will help accelerate the impact big data analytics could have on the NHS, potentially delivering billions in efficiencies in care delivery and improving patient outcomes immeasurably.”
According to a statement from EMC, the data lake will initially allow 17PB of data to be stored and used in analysis. In addition, EMC will use 24 of its all-flash storage X-Bricks XtremIO storage arrays to support its virtualised applications, with back-up services provided by EMC’s Data Domain and Networker.
“The net result is resilient infrastructure that supports massive scalable data storage with robust analytics,” EMC said.
The company added that one of Genomics England’s “key legacies” will be to create an “ecosystem of Cloud Service Providers providing low cost, elastic compute on demand through G-Cloud, bringing the benefits of scale to smaller research groups”.