Partition the sample data and writing it to Kudu - 7.2

Kudu

Version
7.2
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Database components > Kudu components
Data Quality and Preparation > Third-party systems > Database components > Kudu components
Design and Development > Third-party systems > Database components > Kudu components

Procedure

  1. Double-click the tFixedFlowIput component to open its Component view.

    Example

  2. Click the [...] button next to Edit schema to open the schema editor.
  3. Click the [+] button to add the schema columns as shown in this image.

    Example

  4. In the Type column, select Integer for the age column.
  5. In the Key column, select the check box for the age column to define this column as key primary key column.
  6. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  7. In the Mode area, select the Use Inline Content radio button and paste the previously mentioned sample data into the Content field that is displayed.
  8. In the Field separator field, enter a semicolon (;).
  9. Double-click the tKuduOutput component to open its Component view.

    Example

  10. Select the Use an existing configuration check box and then select the Kudu configuration you configured in the previous steps from the Component list drop-down list.
  11. Click Sync columns to ensure that tKuduOutput has the same schema as tFixedFlowInput.
  12. In the Table field, enter the name of the table you want to create in Kudu.
  13. From the Action on table drop-down list, select Drop table if exists and create.
  14. In Range partitions, add one row by clicking the [+] button and do the following:
    1. In Partition No, enter, without double quotation marks, the number to be used as the ID of the partition to be created. For example, enter 1 to create Partition 1.
    2. In Partition column, select the primary key column to be used for partitioning. In this scenario, select age.
    3. In Lower boundary, enter 20 without double quotation marks because the type of the age data is integer.
    4. In Upper boundary, do the same to set it to 60.
    This partitioning definition creates the partition schema reading as follows:
    RANGE (age) (
        PARTITION 20 <= VALUES < 60
    )
    According to this partition schema, the record falling on the lower boundary, the age 20, is included in this partition and thus is written in Kudu but the record falling on the upper boundary, the age 60, is excluded and is not written in Kudu.

    In the real-world practice, if you need to write all the data in the Kudu table, define more partitions to receive the data with proper boundaries.

  15. From the Action on data drop-down list, select Insert.