Memory & Storage

How SSDs and Big Data Are Enhancing Clinical Research

Big data goes hand in hand with clinical research. Clinical trials gather many different types of data, including various types of measurements, genetic profiles, images, multidimensional data points, audio, video and more. These are all mined for trends, correlations and relationships. Removable storage is often used to maintain security of data between data runs.

Due to the size of certain data types, such as x-rays, PET scans, MRI images, photos and so forth — which continue to increase in size as imaging technologies improve — storage capacity is extremely important, as are speed, response times and latency. Given both the large quantities of data and the complexity of the analysis, it’s necessary to use the fastest available storage to hold the data — making it an ideal use case for solid state drives (SSDs).

The U.S. Food and Drug Administration (FDA) has recently started to investigate artificial intelligence (AI) and big data as opportunities to enhance clinical research, as well as collecting data from wearable devices to improve patient monitoring capabilities, according to Health IT Analytics. The same types of data streaming and data mining that are starting to transform other industries are being applied to clinical research as well. Patients can be better monitored, without the need for on-site stays, using Internet of Things (IoT) devices, whether small, simple sensors or smartwatches with EEG capability. All of these devices can produce massive amounts of data, which must be collected, stored and mined, using big data concepts.

Dealing with large amounts of data requires high speeds and plenty of storage space. High performance SSDs have been available for several years, but costs have been high, with higher capacity drives costing more per gigabyte than smaller ones. This has begun to change in the enterprise with the use of triple-level cell (TLC) drives that enable better ROI and higher capacities.

Data in the Details

TLCs allow Samsung drives to provide high capacities without raising costs within the data center performance envelope. As more and more trials are concatenated — using more and more sensors and imaging systems — streaming data at ever-increasing rates and capacities at a lower price point is a major requirement for new systems and nodes.

Understanding SSD Endurance and Over-Provisioning

White Paper

Get your free guide to optimizing SSD over-provisioning for improved cell endurance. Download Now

Many big data systems are built on in-house storage infrastructures, but many more are built using clusters of PCs, whether in-house or in the cloud. Whether deploying containers using Kubernetes or Linux nodes to run cluster software such as OpenHPC or Globus Toolkit, each node will be most efficient in big data or clustering apps with an SSD as its internal drive.

In addition to providing higher transfer rates, SSDs also reduce latency substantially, which can affect clustered applications dramatically. Since nodes in a cluster might all have to wait for data to become available, returning that data as quickly as possible can improve performance in a cluster for all nodes, not just the single node that holds the data.

NVMe for Speed and Efficiency

Samsung’s M.2 NVMe drives such as the 983DCT are not only highly energy efficient, but are also several times faster than SATA SSDs, both in transfer rates and seek times, especially random seek times. This can reduce overall latency in the cluster by orders of magnitude, as well as improving responsiveness of apps running on the cluster nodes. In addition, Samsung SSDs offer best-in-class reliability, ensuring that storage doesn’t need to be updated in order to run 24/7 enterprise search and analysis tools.

Since clinical data collected for either research or health monitoring may be subject to privacy regulations under HIPAA laws and other privacy regulations, it’s critical to ensure that data is encrypted while in use and securely deleted afterwards.

Samsung enterprise-class SSDs also include tools that ensure deleted data cannot be recovered from the drives. Generally, erasing an SSD may not make data completely unrecoverable. The Samsung DC toolkit can ensure that hackers cannot recover data from drives once they’re erased. Given the sensitive nature of data collected in clinical trials, this is an extremely useful feature, not only for HIPAA compliance, but to ensure compliance with other data privacy laws as well.

Samsung has an ongoing commitment to support innovation in both big data and clinical research, working with vendors, labs and universities to advance the integration of SSDs into systems, either as nodes in clusters in the data center or in the cloud. Advanced NVMe speeds combined with high capacities and excellent reliability can enable better service in analyzing data to find new cures.

Take this quick assessment to find the best storage fit for your business needs.

Written By

Logan Harbaugh

Logan Harbaugh is an IT consultant and reviewer. He has worked in IT for over 20 years, and was a senior contributing editor with InfoWorld Labs as well as a senior technology editor at Information Week Labs. He has written reviews of enterprise IT products including storage, network switches, operating systems, and more for many publications and websites, including Storage Magazine, TechTarget.com, StateTech, Information Week, PC Magazine and Internet.com. He is the author of two books on network troubleshooting.

View more posts by Logan Harbaugh