This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.
Before you begin
- If you want to reproduce this scenario, download and extract the file: test-file-to-kafka.zip.
Procedure
- Click Connections > Add connection.
- Add a Test connection then click Add dataset.
-
Select your engine
in the Engine list.
Note:
- It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
- If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
- The list of available connection types depends on the engine you have selected.
- Select JSON in the format list and paste the content of the test-file-to-kafka.json file in the Values field.
- Name it, for example action movies and save it.
-
Do the same to add a connection to a Kafka server:
- Click Connections > Add connection.
-
In the panel that opens, give a name to your connection as well as a
description if needed.
Example
Kafka -
Select the type of connection you want to create.
Here, select Kafka.
- Fill in the connection properties to safely access your Kafka server as described in Kafka properties, check the connection and click Add dataset.
-
In the Add a new dataset panel, name your dataset. In this
example, the collette_movies_json topic will be used to
publish the data about movies.
Example
- Name your dataset, Collette kafka topic for example.
- Click Validate to save your dataset.
- Click Add pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
From Test to Kafka - send to Kafka topic - Click ADD SOURCE and select your source dataset, action movies in the panel that opens.
- Click and add a Split processor to the pipeline in order to split the records that contain both actor first names and last names. The configuration panel opens.
-
Give a meaningful name to the processor.
Example
split actor names -
Configure the processor:
- Select Split text in parts in the Function name list as you want to split the values corresponding to name records.
- Select .detail.starring in the Fields to process list as you want to apply this change to the values of these specific records.
- Enter or select 2 in the Parts list as you want to split the values of these specific records in two parts.
- Select Space in the Separator list as first names and last names are separated by a space in these records.
- Click Save to save your configuration.
-
(Optional) Look at the preview of the processor to see the data after the split
operation.
- Click and add a Filter processor to the pipeline. The configuration panel opens.
-
Give a meaningful name to the processor.
Example
filter on movies with actor Collette -
Configure the processor:
- Add a new element and select .detail.starring_split_2 in the Input list, as you want to filter on the last names of the actors listed in the dataset.
- Select None in the Optionally select a function to apply list.
- Select == in the Operator list.
- Enter Collette in the Value field, as you want to filter on data that contains the name Collette.
- Click Save to save your configuration.
-
(Optional) Look at the preview of the Filter processor to
see your data sample after the filtering operation.
Example
- Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the Apache Kafka topic in which your output data will be loaded, Collette Kafka topic.
- In the Configuration tab of the destination, the Round-Robin model is the default Partition Type used when publishing an event but feel free to specify a partition key according to your case.
- On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
- Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.
Results
Your pipeline is being executed, the movie data from your test file has been processed and the output flow is sent to the collette_movies_json topic you have defined.
What to do next
Once the data is published, you can consume the content of the topic in another pipeline and use it as a source: