Dropping and linking components - 7.3

Talend Open Studio for Big Data Getting Started Guide

author
Talend Documentation Team
EnrichVersion
7.3
EnrichProdName
Talend Open Studio for Big Data
task
Design and Development
Installation and Upgrade
EnrichPlatform
Talend Studio
The DBFS, Azure and processing components to be used are orchestrated in the Job workspace to compose a complete process for data transformation.

Before you begin

  • You have launched your Talend Studio and opened the Integration perspective.

  • An empty Job has been created as described in Creating the Job and is open in the workspace.

Procedure

  1. In the Job, enter the name of the component to be used and select this component from the list that appears. In this scenario, the components are two tFileInputDelimited components, a tMap component, two tFileOutputDelimited components, a tDBFSConnection component, a tDBFSGet component and a tAzureStoragePut.
    • The DBFS components connect to your Databricks file system (DBFS) to donwload the files about movies and directors.
    • The two tFileInputDelimited components are used to load the movie data and the director data, respectively, from your local file system into the data flow of the current Job.

    • The tMap component is used to transform the input data.

    • The tFileOutputDelimited components write the results into given directories in your local system.

    • The tAzureStoragePut component is used to upload the transformed data in an Azure Blob Storage container.
  2. Double-click the label of one of the tFileInputDelimited component to make this label editable and then enter movie to change the label of this component.
  3. Do the same to label the other tFileInputDelimited component to director.
  4. Right click tDBFSConnection and from the contextual menu that is displayed, select Trigger > On Subjob Ok.
  5. Click tDBFSGet to connect tDBFSConnection to tDBFSGet.
  6. Repeat the same operations to always use the On Subjob Ok link to connect tDBFSGet to the tFileInputDelimited component labelled movie, then connect the same tFileInputDelimited component to tAzureStoragePut.
  7. Right click the tFileInputDelimited component that is labelled movie, then from the contextual menu, select Row > Main and click tMap to connect these two components. This is the main link through which the movie data is sent to tMap.
  8. Do the same to connect the director tFileInputDelimited component to tMap using the Row > Main link. This is the Lookup link through which the director data is sent to tMap as lookup data.
  9. Do the same to connect the tMap component to tFileOutputDelimited using the Row > Main link, then in the pop-up wizard, name this link to out1 and click OK to validate this change.
  10. Repeat these operations to connect the tMap component to the other tFileOutputDelimited component using the Row > Main link and name it to reject.

Results

Now the whole Job looks as follows in the workspace: