Scenario: Exchanging customer data with HBase - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, a six-component Job is used to exchange customer data with a given HBase.

The six components are:

  • tHBaseConnection: creates a connection to your HBase database.

  • tFixedFlowInput: creates the data to be written into your HBase. In the real use case, this component could be replaced by the other input components like tFileInputDelimited.

  • tHBaseOutput: writes the data it receives from the preceding component into your HBase.

  • tHBaseInput: extracts the columns of interest from your HBase.

  • tLogRow: presents the execution result.

  • tHBaseClose: closes the transaction.

To replicate this scenario, proceed as the following sections illustrate.

Note

Before starting the replication, your Hbase and Zookeeper service should have been correctly installed and well configured. This scenario explains only how to use Talend solution to make data transaction with a given HBase.

Dropping and linking the components

To do this, proceed as follows:

  1. Drop tHBaseConnection, tFixedFlowInput, tHBaseOutput, tHBaseInput, tLogRow and tHBaseClose from Palette onto the Design workspace.

  2. Right-click tHBaseConnection to open its contextual menu and select the Trigger > On Subjob Ok link from this menu to connect this component to tFixedFlowInput.

  3. Do the same to create the OnSubjobOk link from tFixedFlowInput to tHBaseInput and then to tHBaseClose.

  4. Right-click tFixedFlowInput and select the Row > Main link to connect this component to tHBaseOutput.

  5. Do the same to create the Main link from tHBaseInput to tLogrow.

The components to be used in this scenario are all placed and linked. Then you need continue to configure them sucessively.

Configuring the connection

To configure the connection to your Zookeeper service and thus to the HBase of interest, proceed as follows:

  1. On the Design workspace of your Studio, double-click the tHBaseConnection component to open its Component view.

  2. Select Hortonworks Data Platform 1.0 from the HBase version list.

  3. In the Zookeeper quorum field, type in the name or the URL of the Zookeeper service you are using. In this example, the name of the service in use is hbase.

  4. In the Zookeeper client port field, type in the number of client listening port. In this example, it is 2181.

  5. If the Zookeeper znode parent location has been defined in the Hadoop cluster you are connecting to, you need to select the Set zookeeper znode parent check box and enter the value of this property in the field that is displayed.

Configuring the process of writing data into the HBase

To do this, proceed as follows:

  1. On the Design workspace, double-click the tFixedFlowInput component to open its Component view.

  2. In this view, click the three-dot button next to Edit schema to open the schema editor.

  3. Click the plus button three times to add three rows and in the Column column, rename the three rows respectively as: id, name and age.

  4. In the Type column, click each of these rows and from the drop-down list, select the data type of every row. In this scenario, they are Integer for id and age, String for name.

  5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.

  6. In the Mode area, select the Use Inline Content (delimited file) to display the fields for editing.

  7. In the Content field, type in the delimited data to be written into the HBase, separated with the semicolon ";". In this example, they are:

    1;Albert;23
    2;Alexandre;24
    3;Alfred-Hubert;22
    4;Andre;40
    5;Didier;28
    6;Anthony;35
    7;Artus;32
    8;Catherine;34
    9;Charles;21
    10;Christophe;36
    11;Christian;67
    12;Danniel;54
    13;Elisabeth;58
    14;Emile;32
    15;Gregory;30 
  8. Double-click tHBaseOutput to open its Component view.

    Note

    If this component does not have the same schema of the preceding component, a warning icon appears. In this case, click the Sync columns button to retrieve the schema from the preceding one and once done, the warning icon disappears.

  9. Select the Use an existing connection check box and then select the connection you have configured earlier. In this example, it is tHBaseConnection_1.

  10. In the Table name field, type in the name of the table to be created in the HBase. In this example, it is customer.

  11. In the Action on table field, select the action of interest from the drop-down list. In this scenario, select Drop table if exists and create. This way, if a table named customer exists already in the HBase, it will be disabled and deleted before creating this current table.

  12. Click the Advanced settings tab to open the corresponding view.

  13. In the Family parameters table, add two rows by clicking the plus button, rename them as family1 and family2 respectively and then leave the other columns empty. These two column families will be created in the HBase using the default family performance options.

    Note

    The Family parameters table is available only when the action you have selected in the Action on table field is to create a table in HBase. For further information about this Family parameters table, see tHBaseOutput.

  14. In the Families table of the Basic settings view, enter the family names in the Family name column, each corresponding to the column this family contains. In this example, the id and the age columns belong to family1 and the name column to family2.

    Note

    These column families should already exist in the HBase to be connected to; if not, you need to define them in the Family parameters table of the Advanced settings view for creating them at runtime.

Configuring the process of extracting data from the HBase

To do this, perform the following operations:

  1. Double-click tHBaseInput to open its Component view.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier. In this example, it is tHBaseConnection_1.

  3. Click the three-dot button next to Edit schema to open the schema editor.

  4. Click the plus button three times to add three rows and rename them as id, name and age respectively in the Column column. This means that you extract these three columns from the HBase.

  5. Select the types for each of the three columns. In this example, Integer for id and age, String for name.

  6. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.

  7. In the Table name field, type in the table from which you extract the columns of interest. In this scenario, the table is customer.

  8. In the Mapping table, the Column column has been already filled automatically since the schema was defined, so simply enter the name of every family in the Column family column, each corresponding to the column it contains.

  9. Double-click tHBaseClose to open its Component view.

  10. In the Component List field, select the connection you need to close. In this example, this connection is tHBaseConnection_1.

Executing the Job

To execute this Job, press F6.

Once done, the Run view is opened automatically, where you can check the execution result.

These columns of interest are extracted and you can process them according to your needs.

Login to your HBase database, you can check the customer table this Job has created.