Tony Baer, Principal Analyst, Software – Information Management
With SAP having focused on building up the HANA platform, it has been later to the game in articulating its Big Data strategy. Over the past few months, several important pieces fell into place. SAP announced the extension of Smart Data Access, its federated query technology, from Sybase to the HANA platform, and announced OEM deals with Hadoop platform providers Hortonworks and Intel.
This is still early days for both initiatives – for instance, the Smart Data Access technology, while well-established on Sybase, is still in its first release for HANA and Hadoop. While SAP isn’t alone in promoting federated query, extending it to HANA injects welcome realism into SAP’s data management strategy.
Venturing beyond in-memory
Until recently, SAP’s positioning of HANA emphasized its role as a destination platform for analytics and transaction-processing applications. SAP’s focus on HANA as both analytic and OLTP platform addresses latent demand for applications and use cases that will benefit from realtime processing, as outlined in the Ovum report What is Fast Data?
Nonetheless, Ovum believes that few applications merit storing 100% of data in memory. The Ovum research note “Storage tiering is the new black for databases”concluded that there are different use cases for all forms of storage, from disk to SSD Flash and DRAM. While long-term decline of DRAM prices made the in-memory HANA platform feasible, this form of storage still commands premium pricing compared to disk or solid-state disk (SSD) Flash drives.
Ovum predicts that the trend for analytic applications is aggressive use of data-tiering strategies, where data is stored on the appropriate medium (e.g., DRAM, SSD Flash, disk, tape) based on its utilization, or “temperature.”
From the outset, the HANA platform had the capability to page “colder” (less used) data to disk, but that wasn’t prominent in SAP’s early messaging. With extension of Smart Data Access, a federated query utility first developed by Sybase, SAP has embraced data tiering for HANA – a strategy that Ovum concurs with. Smart Data Access allows data in remote systems to appear as virtual tables in HANA, with processing pushed down to the source systems.
For now, Smart Data Access supports Sybase IQ (the venerable columnar analytic database introduced nearly 20 years ago); Sybase ASE; Teradata; and the latest versions of Hadoop that include Hive 0.12 (the version that has performance improvements from Hortonworks’ Stinger project). On the horizon, SAP plans to add support for targets such as Oracle or Microsoft SQL Server.
SAP is hardly alone in supporting federated data access; for instance, Teradata currently supports such an approach within its relational platforms. However, SAP is distinctive in reaching out to data in Hadoop without physically migrating it to the SQL environment. As discussed in the Ovum report, BI on Hadoop, Part 2 – Making the Connections, the prevalent modes for SQL-Hadoop integration currently range from batch processes to interactive query on Hive metadata (with or without Hive processing); ETL from Hadoop to SQL data warehouse; extract of Hadoop data to SQL as an external table; and physical integration of SQL tables into HDFS (Hadoop Distributed File System) or HBase.
Smart Data Access is not feature-complete. For instance, it does not currently support clustered HANA instances. Additionally, there is no direct support for BLOB/CLOB (large object) data types that may be found on platforms such as Hadoop. That limits Smart Data Access to data that can be found using Hive, while raw data (typically stored in large blocks in HDFS) remains off limits. This is hardly a showstopper, but is an opportunity for SAP to extend its federated query umbrella. SAP’s extension of Smart Data Access introduces a welcome note of realism to its HANA positioning.
Ramping up Hadoop support
In contrast to Oracle, Microsoft, and Teradata – which have chosen a single Hadoop platform OEM partner – SAP has spread it bets with reseller agreements for
- Hortonworks, for a vanilla Apache open source platform; and
- Intel, as a premium provider of a higher-performance Hadoop. The Intel Hadoop distribution aggressively exploits the native instructions of the Xeon chipset for data pre-processing in cache, along with compute-intensive operations such as encryption and graph processing. When unveiled, Intel also boasted Terasort performance benchmarks on a platform with all SSD Flash storage.
Of the two, Hortonworks is the more established provider, having entered the Hadoop platform business two years ahead of Intel. Differences aside, OEM strategies play prominently with both Hadoop providers.
Additionally, SAP has certifications for Cloudera and MapR (this is similar to Teradata, which also has an arm’s length relationship with Cloudera).
Ovum believes that SAP’s selection of two providers is a form of testing the waters, much as the then-EMC Greenplum (now Pivotal) did with an early OEM strategy with MapR for high-performance Hadoop that was phased out in favor of the SQL-on-Hadoop Pivotal HD offering. Nonetheless, the relationship leverages SAP’s joint development work with Intel, where Xeon processors were tuned for the HANA platform.
For now, SAP has no plans to conduct a closer integration of HANA with the Intel Hadoop platform, which has been developed with a high-performance option that includes SSD Flash drives and 10GbE high-speed Ethernet interconnects.
While Ovum believes that disk-based Hadoop platforms will represent the mainstream of that market, developments with open source frameworks such as Spark (for tiering of hot data to in-memory DRAM storage) and Shark (which runs Hive on Spark) create interesting product opportunities for SAP to offer a high-performance converged HANA/Hadoop platform that could seamlessly integrate SAP Business Suite on HANA with Big Data analytics.