1. Overview
In this article, we’ll be looking at the HBase database Java Client library. HBase is a distributed database that uses the Hadoop file system for storing data.
We’ll create a Java example client and a table to which we will add some simple records.
2. HBase Data Structure
In HBase, data is grouped into column families. All column members of a column family have the same prefix.
For example, the columns family1:qualifier1 and family1:qualifier2 are both members of the family1 column family. All column family members are stored together on the filesystem.
Inside the column family, we can put a row that has a specified qualifier. We can think of a qualifier as a kind of the column name.
Let’s see an example record from Hbase:
Family1:{ 'Qualifier1':'row1:cell_data', 'Qualifier2':'row2:cell_data', 'Qualifier3':'row3:cell_data' } Family2:{ 'Qualifier1':'row1:cell_data', 'Qualifier2':'row2:cell_data', 'Qualifier3':'row3:cell_data' }
We have two column families, each of them has three qualifiers with some cell data in it. Each row has a row key – it is a unique row identifier. We will be using the row key to insert, retrieve and delete the data.
3. HBase Client Maven Dependency
Before we connect to the HBase, we need to add hbase-client and hbase dependencies:
<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase</artifactId> <version>${hbase.version}</version> </dependency>
4. HBase Setup
We need to setup HBase to be able to connect from a Java client library to it. The installation is out of the scope of this article but you can check out some of the HBase installation guides online.
Next, we need to start an HBase master locally by executing:
hbase master start
5. Connecting to HBase from Java
To connect programmatically from Java to HBase, we need to define an XML configuration file. We started our HBase instance on localhost so we need to enter that into a configuration file:
<configuration> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> </configuration>
Now we need to point an HBase client to that configuration file:
Configuration config = HBaseConfiguration.create(); String path = this.getClass() .getClassLoader() .getResource("hbase-site.xml") .getPath(); config.addResource(new Path(path));
Next, we’re checking if a connection to HBase was successful – in the case of a failure, the MasterNotRunningException will be thrown:
HBaseAdmin.checkHBaseAvailable(config);
6. Creating a Database Structure
Before we start adding data to HBase, we need to create the data structure for inserting rows. We will create one table with two column families:
private TableName table1 = TableName.valueOf("Table1"); private String family1 = "Family1"; private String family2 = "Family2";
Firstly, we need to create a connection to the database and get admin object, which we will use for manipulating a database structure:
Connection connection = ConnectionFactory.createConnection(config) Admin admin = connection.getAdmin();
Then, we can create a table by passing an instance of the HTableDescriptor class to a createTable() method on the admin object:
HTableDescriptor desc = new HTableDescriptor(table1); desc.addFamily(new HColumnDescriptor(family1)); desc.addFamily(new HColumnDescriptor(family2)); admin.createTable(desc);
7. Adding and Retrieving Elements
With the table created, we can add new data to it by creating a Put object and calling a put() method on the Table object:
byte[] row1 = Bytes.toBytes("row1") Put p = new Put(row1); p.addImmutable(family1.getBytes(), qualifier1, Bytes.toBytes("cell_data")); table1.put(p);
Retrieving previously created row can be achieved by using a Get class:
Get g = new Get(row1); Result r = table1.get(g); byte[] value = r.getValue(family1.getBytes(), qualifier1);
The row1 is a row identifier – we can use it to retrieve a specific row from the database. When calling:
Bytes.bytesToString(value)
the returned result will be previously the inserted cell_data.
8. Scanning and Filtering
We can scan the table, retrieving all elements inside of a given qualifier by using a Scan object (note that ResultScanner extends Closable, so be sure to call close() on it when you’re done):
Scan scan = new Scan(); scan.addColumn(family1.getBytes(), qualifier1); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { System.out.println("Found row: " + result); }
That operation will print all rows inside of a qualifier1 with some additional information like timestamp:
Found row: keyvalues={Row1/Family1:Qualifier1/1488202127489/Put/vlen=9/seqid=0}
We can retrieve specific records by using filters.
Firstly, we are creating two filters. The filter1 specifies that scan query will retrieve elements that are greater than row1, and filter2 specifies that we are interested only in rows that have a qualifier equal to qualifier1:
Filter filter1 = new PrefixFilter(row1); Filter filter2 = new QualifierFilter( CompareOp.GREATER_OR_EQUAL, new BinaryComparator(qualifier1)); List<Filter> filters = Arrays.asList(filter1, filter2);
Then we can get a result set from a Scan query:
Scan scan = new Scan(); scan.setFilter(new FilterList(Operator.MUST_PASS_ALL, filters)); try (ResultScanner scanner = table.getScanner(scan)) { for (Result result : scanner) { System.out.println("Found row: " + result); } }
When creating a FilterList we passed an Operator.MUST_PASS_ALL – it means that all filters must be satisfied. We can choose an Operation.MUST_PASS_ONE if only one filter needs to be satisfied. In the resulting set, we will have only rows that matched specified filters.
9. Deleting Rows
Finally, to delete a row, we can use a Delete class:
Delete delete = new Delete(row1); delete.addColumn(family1.getBytes(), qualifier1); table.delete(delete);
We’re deleting a row1 that resides inside of a family1.
10. Conclusion
In this quick tutorial, we focused on communicated with a HBase database. We saw how to connect to HBase from the Java client library and how to run various basic operations.
The implementation of all these examples and code snippets can be found in the GitHub project; this is a Maven project, so it should be easy to import and run as it is.