Reference: HBase Intro

Apache HBase

Apache HBase is a part of the Apache Hadoop ecosystem designed for fast reads and writes with high concurrency and strict consistency. Essentially HBase is a NoSQL database that runs on top your Hadoop cluster and provides you random real-time read/write access to your data. Both structured and unstructured data can be stored in HBase which stores data as key/value pairs in a columnar fashion. Since HBase provides low latency access to small amounts of data from within a large data set using a flexible data model it is most suited for real-time analytical query needs where the following apply:

 

Random Real-Time read/write access to data

Data stored in collections by key

Variable Schema across rows

Key based access to the data

 

While the data structure semantics of HBase and Relational Databases are shared the semantics of these terms are functionality very different. Additional technical details available via hbase.apache.org.

 

HBase Limitations

Apache HBase has some well documented limitations and developers need to be aware of these when developing HBase environments. HBase does not offer all the of the extra features associated with Relational Database Management Systems as a result applications built against an RDBMS cannot simply be ported over to an HBase environment. Moving from an RDBMS to HBase requires an entire redesign of the schema, environment and workflows.

 

Not applicable for transactional or relational analysis

Does not support traditional SQL

Not a substitute for large scale MapReduce

Does not support Joins

 

Qubole Features

Qubole HBase offering comes backed with a variety of features and support methodologies to ensure customer success with HBase Clusters and deployment. Qubole only controls the management and system usage of these technologies - the customer will be responsible for using HBase directly from within the Customer application. The Quble HBase offering is similar to the AWS RDS, MongoLab or MongoHQ offerings. Qubole provides options for adding, replacing or removing nodes within the User Interface as well as through the API. During these operations Qubole will intelligently move data around the cluster by carefully orchestrating data compactions and HDFS block transfers prior to reassigning the HBase RegionServer. Qubole regularly backs up HBase data to S3 and administrators may use the Qubole Cluster Management UI to configure the schedule. Administrators may choose to recover all data or specific tables to support deploying test or development clusters in addition to one-off analysis. This feature also allows Administrators to be very intentional when recovering from data issues.

 

Have more questions? Submit a request

Comments

Powered by Zendesk