Two multipliers of corporate agility, predictive analytics and the cloud, until recently were rarely mentioned in the same sentence. A new service from Amazon is the first of what will probably be many offerings of NoSQL-as-a-service.
IT departments are on the front lines of grappling with big data -- defined here in terms of its volume, velocity, and variety. In a typical corporation big data might comprise Web server logs, chatter from social networks, sales data, mobile location data, banking transactions, the content of Web pages, financial market data, scans of government documents, etc., etc.
A more simplistic definition of big data is that it is data too big for relational databases to handle; hence the rapidly growing popularity of NoSQL solutions such as CouchDB and MongoDB, et al.
The main driver for the interest in processing big data is the emergence of predictive analytics software capable of drawing valuable insights and business intelligence from it -- identifying patterns, spotting trends, correlating cause and effect. The ability to look around the next corner faster than real-time is becoming a cornerstone for agile business.
Much of the big-data analytics software out there relies on technologies such as Hadoop and Google MapReduce to do the processing and crunching. Venture capital firms are setting up funds to spur and profit from the growth of big data and analytics.
Cloud services are another amplifier of business agility. When someone else manages some combination of the storage, replication, caching, patching, backup, etc., you can concentrate on the problem at hand and get it solved more quickly and cheaply.
Until this week all the pieces for big-data analytics didn't exist in the cloud in usable form. Amazon, Google, and others offered MapReduce/Hadoop-style processing of data sets, but hosted NoSQL database services were thin on the ground. Only Amazon provided such a database service -- SimpleDB -- but it comes with significant limitations that render it unsuitable for applications that must scale on the fly and that need predictable performance. In addition, SimpleDB's implementation takes the "eventually consistent" approach to data concurrency to an extreme, excluding it as a solution for some classes of problems.
This week Amazon introduced DynamoDB, the first -- and certainly not the last -- NoSQL-as-a-service cloud offering. It offers provisioned throughput scaleable over six orders of magnitude in a running application with a simple API call. Here are Amazon's press release, some background and technical detail from the company's CTO, and a developer's perspective.
A much-discussed paper presented at OSCon last summer described how you can spin up your own NoSQL-as-a-service. This idea is hardly attractive to agile IT shops; it is only a marginal improvement over building out your own private cloud and running a NoSQL database on top of that.
The pieces are starting to come together to deal with big data for predictive analytics entirely in the cloud. If you have been hesitating before plunging into the NoSQL pool, check out Amazon's offering. And stay tuned for similar products from other established cloud and database players.