Hadoop and NoSQL: Friends, not frenemies

Jnan Dash
January 7, 2014 —  (Page 1 of 3)
The term Big Data is an all-encompassing phrase that has various subdivisions addressing different needs of the customers. The most common description of Big Data talks about the four V’s: Volume, Velocity, Variety and Veracity.

Volume represents terabytes to exabytes of data, but this is data at rest. Velocity talks about streaming data requiring milliseconds to seconds of response time and is about data in motion. Variety is about data in many forms: structured, unstructured, text, spatial, and multimedia. Finally, veracity means data in doubt arising out of inconsistencies, incompleteness and ambiguities.

Hadoop is the first commercial version of Internet-scale supercomputing, akin to what HPC (high-performance computing) has done for the scientific community. It performs, and is affordable, at scale. No wonder it originated with companies operating at Internet scale, such as Yahoo in the 1990s, and then at Google, Facebook and Twitter.

In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.

NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.

Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.

Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.

Related Search Term(s): Hadoop, NoSQL, Big Data

Pages 1 2 3 

Share this link:


01/08/2014 09:24:50 AM EST

Great piece! Very informative. However, note that only my original 3Vs (volume, velocity, variety) are definitional characteristics of big data. Veracity is an *aspirational* characteristic that is not a measure of magnitude. Therefore it is only confusing to use it in defining big data. It appears certain vendors added a fourth "V" either to avoid crediting Gartner's/my original piece from 12 years ago (ref: or to align them with certain software components they're selling. For more on this and other purported "Vs" see my blog: --Doug Laney, VP Research, Gartner, @doug_laney

United StatesDoug Laney

Doug Cutting: Why Hadoop is still No. 1
How Hadoop become the de facto standard, and what it plans on doing next Read More...

News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>



Download Current Issue

Need Back Issues?

Want to subscribe?