By Sue Poremba
Sue Poremba is a freelance writer focusing primarily on security and technology issues and occasionally blogs for Rackspace Hosting.
The term “big data” may be a bit of a misnomer. For some companies, big data is actually huge data. Even small companies now find themselves immersed in massive amounts of data of all sorts, which leads to the problem of storing all of it.
Enter the big data cloud storage solutions, which allow companies to store and access media by the terabyte. Storage by the terabyte may have seemed unfathomable just a few years ago, but according to John Griffith of SolidFire, by today’s standards, terabytes of block storage really isn’t all that much.
“Many service providers look at opportunity in hundreds or thousands of terabytes. Between storage hungry database applications and other mission critical systems, terabytes of data can be consumed rather quickly,” Griffith said.
However, Griffith quickly pointed out that block storage is not a one size fits all in terms of performance. Some applications may require high performance, while others may be less sensitive. “Cloud block storage allows you to allocate the quantity of storage required by the application with the performance characteristics needed for a positive user experience. By architecting storage around Quality of Service and leveraging data de-duplication and compression techniques, scale can be used as an advantage,” he said.
Elastic value to end users depends on easy scalability and management. This is essential when searching for the right storage infrastructure software. The software should allow for both plug and play scale with no provisioning or storage balancing. “This will support both expansion and reduction if large users choose to delete big amounts of data. This can only be achieved using object storage since it allows for unlimited scalability since it has no file system,” said Adrian J Herrera, Senior Director of Marketing at Caringo, an object storage solutions and developer.
Yet, as Sean Gallagher pointed out in an Arstechnica article, some approaches fail to scale because of performance constraints.
He wrote: “In the cloud, there can be potentially thousands of active users of data at any moment, and the data being read and written at any given moment reaches into the thousands of terabytes. The problem isn’t simply an issue of disk read and write speeds. With data flows at these volumes, the main problem is storage network throughput; even with the best of switches and storage servers, traditional SAN architectures can become a performance bottleneck for data processing.”
It doesn’t have to be that way, Herrera countered: “Almost all cloud storage providers use object storage. When using object storage the files are connected to the metadata. This allows for an easy interface to large data sets and makes searching very simple.
“Applications specifically built for the cloud will not need to bring the data sets back to a local facility; rather they will access them in the cloud, which will make the data almost immediately accessible and will perform like local storage. There are more and more organizations that are keeping their active datasets on the cloud so access is not a problem since they are already architected for this type of access. Those companies usually will have more than one provider as well, for disaster recovery scenarios.”
With the amount of acquired data growing exponentially, being able to capture, process, cleanse and store these vast amounts of data will require storage that is large and flexible. How long will it be until we think of terabyte as a normal or too small storage option?