Before you begin
-
You have previously created a connection to the system
storing your source data.
Here, a Test connection.
-
You have previously added the dataset holding your source
data.
Download and extract the file: type_converter-datacleansing-taxi.zip. It
contains hierarchical taxi data including pickup time, dropoff time, fare,
etc.
-
You also have created the connection and the related dataset
that will hold the processed data.
Here, a Test dataset.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Fill empty cells with appropriate value
-
Click ADD SOURCE to open the panel allowing you to select
your source data. Here it is taxi-related data that contains a column with empty
records (.store_and_fwd_flag).
Example
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add a Data cleansing processor to the pipeline.
The configuration panel opens.
-
Give a meaningful name to the processor.
Example
fill empty cells with N/A
value
-
In the Configuration area:
-
Select Fill cells with value in the
Function name list as you want to add the tax amount
to the price of the purchase.
-
Select .store_and_fwd_flag in the Fields to
process list, as it corresponds to the field with empty
records.
-
Select Value in the Use with
list and enter N/A in the Value
field as you want to replace all empty records with the value N/A (non
available).
-
Click Save to
save your configuration.
Look at the preview of the processor to compare your data before and after the
cleansing operation.
-
Click ADD DESTINATION and select the dataset that will hold
your cleansed data.
Rename it if needed.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the empty records are replaced with the
fixed value you have indicated and the output flow is sent to the target system you
have indicated.