Creating an HDFS Job in the Studio - 6.5

HDFS components and Azure Data Lake Store (ADLS)

EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Data Governance > Third-party systems > File components (Integration) > HDFS components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Data Quality and Preparation > Third-party systems > File components (Integration) > HDFS components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Design and Development > Third-party systems > File components (Integration) > HDFS components
EnrichPlatform
Talend Studio

Procedure

  1. On the Integration perspective, drop the following components from the Palette onto the design workspace: tFixedFlowInput, tHDFSOutput, tHDFSInput, tLogRow and three tLibraryLoad.
  2. Connect tFixedFlowInput to tHDFSOutput using a Row > Main link.
  3. Do the same to connect tHDFSInput to tLogRow.
  4. Double-click one of the three tLibraryLoad components to open its Component view.
  5. Click the [...] button to open the Module wizard and select the library to be loaded.

    In this example, load azure-data-lake-store-sdk-2.1.4.jar. This is one of the libraries required by the HDFS components to work with Azure Data Lake Store. You can find this jar in the MVN repository such as Azure Data Lake Store Java Client SDK

  6. Do the same to use the other two tLibraryLoad components to load the other two libraries.

    In this example, these libraries are hadoop-azure-datalake-2.6.0-cdh5.12.1.jar and jackson-core-2.8.4.jar.