Google is creeping up your data stack. In response to Microsoft’s recent Azure DocumentDB announcement Google has released Cloud BigTable into the wild. Cloud BigTable is a managed NoSQL database service based on a version of BigTable used internally for more than decade. Last week Google announced the database would be made available to the masses, and could even be accessed through the Apache HBase API (take that, Hadoop vendors!). It’s a big play in the war for the control of computing workloads running in the cloud.
Sure, Google’s announcement could be viewed as yet another volley in the cloudy game of thrones but it’s more than that. There are two reasons it’s interesting:
Making stacks from the stack
There are only three companies that have a chance of long-term success with mass-market cloud infrastructure business. No prizes for guessing the names: Amazon, Microsoft and Google. Amazon is the clear leader. Microsoft is making huge investments which, so far, have the Redmond-based giant out ahead of Google too. The bets are big and the stakes are high. The reality is that most companies are moving to the cloud. It’s only a matter of time and which infrastructure player they chose to invest with.
Nobody generates their own electricity in their house, it’s a utility. Cloud infrastructure should be the same. As profit margins flatten for cloud offerings, the major players are looking elsewhere for big data dollars. That’s what Google’s announcement is all about. The search behemoth wants to gobble up more of the big data stack.
In the beginning cloud was just the basic physical infrastructure. In recent years vendors are adding more and more of what you need to run an application. If you want to run infrastructure on Google, Amazon or Microsoft today, there’s less you need to do for that to become a reality.
So how does this arms-race impact our friendly neighbourhood IT decision maker? Right now it’s all good. There are more options and the fierce competition is forcing down prices. However, buyer beware – many of the services and platforms are far more niche than the providers would have you believe (see below), while at the same time locking you into the vendor’s technology stack.
Full circle: From research paper to product
Many of the important software innovations of the past decade are based on published papers describing Google’s infrastructure. Hadoop is based on two key pieces of research Google published in 2003 and 2004 on its file system (GFS) and map-reduce implementation. Other examples of research that spawned popular open source software projects include Chubby (Zookeeper), Dremel (Drill), and BigTable (HBase and Cassandra).
HBase was initially developed at a company called Powerset to power a natural language search system, which was acquired by Microsoft. Facebook built Cassandra to power its Inbox search feature. Both HBase and Cassandra use a data model inspired by BigTable, which is why they are being compared to Google's new offering.
Fast forward seven years and the thing that inspired people to build these open source software projects is now a service you can use. And to take advantage of it you don’t need to build the software that Google uses. In fact you don’t even have to run a product that emulates it. You can really use Google’s Bigtable to power your own applications.
As my friend and former colleague Matt Asay pointed out: “Google has finally given enterprises a clear reason to prefer Google over its cloudy alternatives: The chance to scale and run like Google.”
Are you going to need a BiggerTable?
Organisations that are interested in Google Cloud BigTable have already decided this type of data model is right for their application. This offering is competitive with products from DataStax and the Hadoop distribution vendors that support HBase. While some advanced customers will choose to manage their own infrastructure, many will be happy to let someone else take care of the details, especially if that someone is Google.
Cloud BigTable is a database with a very narrow set of features. It is a wide column store with a simple key-value query model. Like Cassandra and HBase, Cloud BigTable is limited by:
- A complex data model which presents a steep learning curve to developers, slowing the rate of new application development
- Lack of features such as an expressive query language (key-value only), integrated text search, native secondary indexes, aggregations and more. Collectively, these enable organisations to build more functional applications faster
Competition conquers complexity
This is a story about cloud infrastructure warfare and, in a way, we all win. In the insanely competitive cloud market the prices are dropping as quickly as the capabilities are expanding. As we’ve seen in the mobile industry over the past decade, incredible competition drives incredible innovation.
It’s clear the future of databases are primarily going to be in the cloud. MongoDB is designed for cloud deployments and is incredibly popular on AWS, and Google Cloud Platform already offers hosted MongoDB. We also think that a big part of removing complexity is finding software an organisation can standardise on. No one wants to deal with half a dozen databases. They want standards that have the best parts of the various niche data tools.
To achieve this, the big players are throwing huge money at infrastructure and services. Google, Amazon and Microsoft will continue to search for more areas of big data where they can provide value in the market. Ultimately this will lower barriers to entry for new products and services.
Before the year is out, I’d expect there will be even more vendors trying to creep up your big data stack. That’s good for all of us.