Creating a Job with multiple paths from a single source to the same target

author
Shicong Hong
EnrichVersion
6.4
6.3
6.2
6.1
6.0
EnrichProdName
Talend Open Studio for Big Data
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Data Services Platform
Talend Open Studio for ESB
Talend Big Data Platform
Talend Big Data
Talend ESB
Talend Open Studio for MDM
Talend MDM Platform
Talend Open Studio for Data Integration
Talend Data Integration
Talend Data Management Platform
task
Design and Development > Designing Jobs
EnrichPlatform
Talend Studio

Creating a job from multiple paths from a single source to the same target

Creating a subjob with a multiple path structure from a single source to a single target, as shown in the capture below, is not allowed in the Talend Studio. In this document, you will find two workaround examples to solve this problem that provide the same result.

Doubling the input flow

In this Job, the input flow will be doubled to avoid the flow circle.

This Job will use the following components:

  • Two tFileInputDelimited, tReplicate and tMap components to create two input flows.
  • Create a third tMap component and a tLogRow to print the result in the console.

Procedure

  1. Duplicate the input flow part that is composed of tFileInputDelimited_1 and tReplicate_1. In this job, tFileInputDelimited_2 and tReplicate_2 are a simple copy & paste of tFileInputDelimited_1 and tReplicate_1, thus performing the same processing operation.
  2. Configure the properties of each component for both input flows. As they're duplicates, both input flows read the same source file and therefore are configured the exact same way in their respective Properties view.
    Tip:

    This Job design performs the exact same operation and let you get the same result as the unauthorized job described in the introductive part of this document. However, if your input flow is complex and contains numerous components and/or processes large sets of data, you may notice a significant decrease of performance. The reason for this performance issue is that the Job spends twice as much time to perform the same processing.

Storing the result of the input flow in a temporary location

In this Job, the results of the input flow are stored in a temporary location (either in a file or in memory (cache)) to reduce the processing time when processing large sets of data or if your input flow is complex.

This Job will use the following components:

  • a tFileInputDelimited, tReplicate and two tMap components to create two input flows.
  • Two tHashOutput and tHashinput components to store and use the results from a temporary location.
  • a third tMap component and a tLogRow. to print the results in the console.

Procedure

  1. Create two input flows as shown above adding the tFileInputDelimited, the Replicate, the tMap and the tHashOutput components on the workspace and creating row > main links between.
    Note: tHashInput and tHashOutput are components from the Technical family and are hidden by default.

    For more information about how to use these components, see the Context Variables article.

  2. Either use two tFileOutputDelimited components or tHashOutput components to store the result in place from tMap_1 or tMap_2.
  3. Then read the data in the next Subjob, from the temporary file using a tFileInputDelimited component or from the memory using a tHashInput component. The job example above caches the result into memory.
  4. Configure both tHashIntput components in the respective component Properties view to link them with the two tHashOutput components.
    Tip: tHashOutput_1 is used to cache the result out from tMap_1 into memory. tHashOutput_2 is used to cache the result out from tMap_2 into memory. In order for the data to be retrieved from the memory, the tHashInput_1 component must be linked with the tHashOutput_1 component and the tHashInput_2 with tHashOuput_2, respectively.
  5. Then read the data in the next Subjob, from the temporary file using a tFileInputDelimited component or from the memory using a tHashInput component. The job example above caches the result into memory.