A recent survey by consulting firm NewVantage Partners reveals that the portion of U.S. companies using big data in the past three years has jumped from 5% to 63% of those companies, 70% now say that big data is critically important to their business, up from 21% in 2012.
Big data has revolutionised research methodology by making it possible to measure all of the data. Whether predicting earthquakes, providing real-time weather alerts or just analysing the best traffic patterns, big data is changing our lives and society. But how will big data transform business results? And what are the burning big data applications for the enterprise?
Big data for big problems
Surveys show the number one challenge CIOs face today is data growth. The amount of digital information created and shared in the world increased ninefold in just five years. Big data was at almost two zettabytes by the end of 2011, and by 2015 it had quadrupled to nearly eight zettabytes.
CIOs are challenged because with more data comes increased cost, complexity and risk. All costs go up as data grows including CPU, storage, network and data centre expenses. End users suffer as screen response times slow, and IT teams scramble – and fail – to complete critical jobs on time. Data growth reduces system availability and extends outages, since more data requires more time to manage. Governance and compliance concerns grow by the terabyte as well, because more data means more risk.
Leading organisations run their enterprise applications on high-end, multiprocessor servers that provide memory database processing on solid state arrays. These systems deliver ultra-high performance for relational database applications and businesses need them to meet critical objectives; but, as the amount of data climbs these production systems face performance challenges. The cost to upgrade is sky high.
One solution is to run only current data on Tier 1 infrastructure and move the rest to Apache Hadoop. As data ages it becomes less active and less valuable. Recent studies have shown that up to 80% of a typical organisation’s data is inactive. By moving inactive data to commodity platforms, businesses may achieve significant payback. Consider the following cost comparison:
A common data platform
So, if current data should run on Tier 1 infrastructure for optimised performance, then less current data should run on a big data platform for the same reason. Big data platforms, and in particular Apache Hadoop, are ideal common data platforms (CDPs) for older data as they offer uniform data collection, low-cost data storage and reporting across the enterprise.
Apache Hadoop ingests both structured and unstructured data, leverages low-cost commodity infrastructure, and delivers massive scalability. Using the MapReduce programming model to process large datasets across distributed compute nodes in parallel, Hadoop can process any workload and store data from any source at the lowest possible cost.
Information lifecycle management
Information lifecycle management (ILM) is a best practice for managing data throughout its lifecycle. ILM solutions improve application performance and optimise infrastructure utilisation to reduce costs. ILM also establishes a governance framework based on retention policies to establish compliance controls for enterprise data.
ILM classifies data at the time of creation based on security, access control and retention management. Business rules like “legal hold” ensure data governance is proper and retention policies are optimised for infrastructure utilisation. For instance, policies may be created to run “current” data on Tier 1 infrastructure and move all other “not current” data to low-cost Hadoop.
With as much as 80% of data inactive, the ROI to implement ILM is compelling; but, for many organisations, ILM simply provides the essential risk and compliance governance framework to manage data throughout its lifecycle.
Big data applications for the enterprise
Big data establishes a new enterprise blueprint on a petabyte scale, and big data applications are emerging to leverage the opportunity. Enterprise archiving and enterprise data lake are two of the most popular big data applications that have emerged because they reduce infrastructure costs, improve application performance, strengthen data governance and transform business results with advanced business intelligence.
Enterprise data lake and advanced analytics
Apache Hadoop represents a significant opportunity for enterprise data warehouse (EDW) and advanced analytics applications. Data warehouse users continually seek ways to describe data better, and EDW platforms sometimes struggle to deliver more specific views of data. Downstream analytics and NoSQL applications are also challenged by the canonical, top-down data approach delivered by traditional EDW systems.
Enterprise data lake applications store copies of production data “as is”, eliminating the need for heavy extract, transform and load (ETL) processes during data ingestion. Once stored within the Hadoop Distributed File System (HDFS), enterprise data may be more easily distilled by analytics applications and mined for critical insights.
Enterprise data lake leverages ILM to establish data governance controls and allow businesses to meet compliance objectives. ILM classifies data before ingestion based on security, retention management and access control policy.
Big data enhances traditional EDW strategies because Apache Hadoop stores and processes structured and unstructured enterprise data in bulk and at a very low cost. Lightweight ETL processes, massively scalable performance, low cost and flexible data-handling make enterprise data lake a powerful and efficient advanced analytics platform.
Organisations continually demand improved performance from their mission-critical online applications, but the cost of ultra-high performance infrastructure is often too high. How high depends on how much data will be processed online using Tier 1 compute nodes with full-flash memory arrays.
Mission-critical enterprise applications perform better when inactive data is moved from production databases onto low-cost, bulk data storage platforms. Enterprise archiving with Apache Hadoop uses ILM retention policies to move older, inactive data from online systems to a nearline HDFS repository for easy access by end users.
When online datasets are reduced, enterprise applications run faster and with higher availability, and dramatic infrastructure savings are possible. Enterprise archiving uses ILM to establish a governance, risk and compliance framework for the data – from creation to deletion – where all data is classified and properly accessible at all times.
In ‘Market Overview for Big Data Archiving’, Forrester Research vice president Noel Yuhanna comments: “With growing data volume, increasing compliance pressure, and the revolution of big data, enterprise architect (EA) professionals should review their strategies, leveraging new technologies and approaches. In the era of big data, archiving is a no brainer investment.”
Gartner reports that data growth is the biggest data centre and hardware infrastructure challenge, and is also “particularly associated with increased costs relative to hardware, software, associated maintenance, administration and services.” As more and more data is processed and stored, system performance deteriorates, costs increase, and compliance objectives become harder to meet.
At the same time, demand has never been higher for improved access to data through data warehouse and enterprise analytics applications. Organisations are seeking competitiveness and new ways to gain value by mining enterprise data.
Solutions for big data are ideal common data platforms for enterprise data management, and big data applications that transform business results are available now. Apache Hadoop stores structured and unstructured data in a single repository accessible by text search for 55 times less than Tier 1 infrastructure. With ILM, organisations utilise infrastructure far more efficiently and improve governance, risk and compliance at the same time.
The wait is over. Big data applications for the enterprise have finally arrived.