This story was written by Keith Dawson for UBM DeusM’s community Web site Business Agility, sponsored by IBM. It is archived here for informational purposes only because the Business Agility site is no more. This material is Copyright 2012 by UBM DeusM.

Defining Big-Data

One way to tell that a technology is immature is that people don't agree on its definition.

Big-data is everywhere, from corporate data centers to the cloud to government research programs. But what is is? A definition is proving curiously elusive.

Most recent attempts at defining big-data revolve around the "three Vs" (or sometimes "four Vs") -- volume, velocity, variety (and sometimes variability). This characterization was introduced in October last year in a report, Enterprise Hadoop: The Emerging Core Of Big Data, by James Kobielus G. and others at Forrester Research. IBM (sponsor of Business Agility), among others, refers to this definition of big-data in their product literature.

The three Vs in a nutshell:

last January Edd Dumbill, writing in O'Reilly Radar, fleshed out the descriptions of the three Vs and supplied concrete examples. This article was spread more widely by Forbes and others. If you haven't read Dumbill's analysis, now would be a good time to have a look.

In March of this year, the inaugural number of ZDNet's "Big on Data" blog, by Andrew J. Brust, concluded this way: "This blog will investigate and explain what Big Data is about, based on the premise that there's no perfect consensus on that definition and that it is, in any case, changing."

According to Network World, data scientist John Rauser tried out a new and simpler definition at a recent big-data conference in Boston hosted by Amazon Web Services. Rauser suggested that big-data is any amount of data that's too big to be handled by one computer. NWW spoke with a number of analysts and data scientists, many of whom did not consider this definition an improvement on ones based on the three Vs. For example, Dan Vesset, program vice president of the business analytics division of the research firm IDC, said: "I'd like to see something that actually talks about data instead of the infrastructure needed to process it."

Yet there is something appealing in the idea of characterizing big-data problems by comparison with the capabilities of current technology -- which will grow as big-data grows ever more voluminous, velocity-laden, and various. Jeff Kelly, a big-data analyst at the Wikibon project, told NWW: "When you're hitting the limits of your technology, that's when data gets big. You know it when you see it."

Which brings us to our own David Coursey, Business Agility blogger extraordinaire, who rose to my invitation to offer a definition of big-data. We'll give him the last word.

Big-data is whatever amount of data that you should be working with, but cannot. You could add specific numbers or mention Hadoop, but regardless of that your "big-data" may not be mine. What is big-data to the NSA?