Before you begin
-
You have previously created a connection to the system
storing your source data.
Here, an Amazon S3 connection.
-
You have previously added the dataset holding your source
data.
Download the file: string-crops.csv. It contains a dataset
with data about harvested crops in Mali with crop types, value of production,
harvested areas, etc.
-
You also have created the connection and the related dataset
that will hold the processed data.
Here, a dataset stored in the same S3 bucket.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Process strings about harvested
crops
-
Click ADD SOURCE to open
the panel allowing you to select your source data, here data about harvested
crops in Mali in the year 2005.
Example
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add a Strings processor to the pipeline. The
configuration panel opens.
-
Give a meaningful name to the processor.
Example
change crop types to upper
case
-
In the Configuration area:
-
Select Change to upper case in the Function
name list.
-
Select .crop_parent in the Fields to
process list, as you want to change the crop type values to
upper case.
-
Click Save to
save your configuration.
Look at the preview of the processor to compare your data before and after
the operation.
-
Click and add another Strings processor to the pipeline.
The configuration panel opens.
-
Give a meaningful name to the processor.
Example
match crop IDs with
IDs
-
In the Configuration area:
-
Select Match similar text in the Function
name list.
-
Select .crop in the Fields to
process list.
-
Select Other column in the Use
with list and .id in the
Column list as you want to compare the crop name ID
with the record ID.
-
Enter 0 in the Fuzziness field as you want exact
matches between the two field values.
-
Click Save to
save your configuration.
Look at the preview of the processor to compare your data
before and after the operation. You can see a new column
crop_matches in which exact matches have a
true values and IDs that do not match have a
false value.
-
Click ADD DESTINATION and select the dataset that will hold
your processed data.
Rename it if needed.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the strings selected have been processed and
the output flow is sent to the S3 bucket you have indicated.