Configuring the HDFS components to work with Azure Data Lake Store - 6.5

HDFS components and Azure Data Lake Store (ADLS)

EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Data Governance > Third-party systems > File components (Integration) > HDFS components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Data Quality and Preparation > Third-party systems > File components (Integration) > HDFS components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Design and Development > Third-party systems > File components (Integration) > HDFS components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tFixedFlowInput to open its Component view to provide sample data to the Job.

    The sample data to be used contains only one row with two column: id and name.

  2. Click the [...] button next to Edit schema to open the schema editor.
  3. Click the [+] button to add the two columns and rename them to id and name.
  4. Click OK to close the schema editor and validate the schema.
  5. In the Mode area, select Use single table.

    The id and the name columns automatically appear in the Value table and you can enter the values you want within double quotation marks in the Value column for the two schema values.

  6. Double-click tHDFSOutput to open its Component view.
  7. In the Version area, leave the options as they are. These options do not impact your Job.
  8. In the NameNode URI field, enter the NameNode service of your application. The location of this service is actually the address of your Data Lake Store.

    For example, if your Data Lake Store name is data_lake_store_name, the NameNode URI to be used is adl://data_lake_store_name.azuredatalakestore.net.

  9. In the Advanced settings tab, add the following parameters to the Hadoop properties table, each being put in double quotation marks:

    dfs.adls.oauth2.access.token.provider.type

    ClientCredential

    fs.adl.impl

    org.apache.hadoop.fs.adl.AdlFileSystem

    fs.AbstractFileSystem.adl.impl

    org.apache.hadoop.fs.adl.Adl

    dfs.adls.oauth2.client.id

    Enter the application ID you obtained in previous steps.

    dfs.adls.oauth2.credential

    Enter the application key you obtained in previous steps.

    dfs.adls.oauth2.refresh.url

    Enter the Azure OAUTH token endpoint you obtained in previous steps.

    dfs.adls.oauth2.access.token.provider

    org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider

  10. Do the same configuration for tHDFSInput.
  11. If you run your Job on Windows, following this procedure to add the winutils.exe program to your Job.
  12. Press F6 to run your Job.