Wide column data store

Records stored in this kind of data store can hold a dynamic number of columns. The number of columns may vary from records to records. This is a kind of key-value data store but 2 dimensional.
 


Apache HBase

Positive

Negative

  1. Fast write
  2. Versioning & TTL
  3. SQL via Hive / Phenix / Drill
  4. Very fast random read
  5. Strong integration with MapReduce & Spark
  6. Native encryption
  7. Strict consistency (CP of CAP)
  8. SQL layer provided via Apache Phoenix project
  1. High disk, CPU and network needs
  2. No logging on who does what
  3. When full Table scan is needed
  4. Rely on an existing Hadoop cluster (for Zookeeper & HDFS)

 


Recommended use:
  • Any place where scanning huge, two-dimensional join-less tables are a requirement.
  • Wide column data store

 

Apache Accumulo

Positive

Negative

  1. Fast write
  2. Very fast random read
  3. Strong integration with MapReduce & Spark
  4. Native encryption
  5. Strict consistency (CP of CAP)
  6. Security up to the field level
  7. Server-side programming
  1. High disk, CPU and network needs
  2. Rely on an existing Hadoop cluster (for Zookeeper & HDFS)

 


Recommended use:
  • Any place where HBase or other column store database can fit, but when the security and access control is a must.

 


Apache Cassandra

Positive

Negative

  1. Written in Java - native Java support
  2. Minimal administration
  3. Decentralized, distributed peer-to-peer architecture
  4. Fault-tolerant with no single point of failure
  5. Built-in data compression
  6. SQL-like language
  7. AP of CAP
  1. Weak security
  2. Sub-millisecond consistency
  3. Ad-hoc querying
  4. Latency

 


Recommended use:
  • Needs for huge data sets with fast random read