The lookup feature matches data from the current preparation with its counterpart in a reference dataset, allowing you to add this reference data to your preparation.When using the lookup feature in Talend Cloud Data Preparation, column descriptions previously added in the lookup dataset via Talend Cloud Data Inventory will be displayed in the preparation.
Before you begin
- Google Cloud Storage (Cloud file systems)
- Azure Synapse (Database)
- Google BigQuery (Database)
- Google BigTable (Database)
- Marketo (Business Apps)
- Google Analytics (Business Apps)
- NetSuite (Business Apps)
- Workday (Business Apps)
- Kafka (Messaging)
- RabbitMQ (Messaging)
- Open your preparation.
Click the lookup button to open the lookup panel.
Click Select dataset to select an existing dataset.
Select the dataset you want to use to perform the lookup.
In this case, you need to add the dataset containing the list of two-letter state codes and their matching names. It must have been imported to Talend Cloud Data Preparation beforehand.
- Click Select.
From the Current preparation and Lookup
dataset drop-down list, select the columns matching in your main
preparation and your reference dataset, the state_code and
States Code columns in this example.
In order to perform a lookup, at least one column with matching data must be present in the preparation and dataset that you want to blend.
From the Columns to add drop-down list, select the column
containing the states to add it to the current dataset, State
in this example.
- Choose to apply those changes only on the filtered rows or on all of the rows.
Click Submit. The state name for each matching state code
between main preparation and reference dataset is retrieved.
The lookup feature behaves as follows:
- If a value is common to both your main preparation and the reference dataset, the additional information is retrieved.
- If a value exists in your preparation but not in the reference dataset, the lookup operation will leave an empty cell in the resulting column.
- If a value from the reference dataset does not find a match in the main preparation, the additional information will simply not be retrieved.
The logic behind the lookup operation in Talend Cloud Data Preparation corresponds to a left outer join when speaking in terms of Venn diagrams:
Applied to our example, it means that you have used the common information about the state codes to create a join, and retrieved only the useful information from the second dataset to enrich the first one.
Another behavior of the lookup feature, that was not illustrated in the previous example, is how the possible duplicate and empty values from a reference dataset are handled.
Let's take the following example, where one of the State codes is missing from your preparation, and your reference dataset has a duplicate entry for the NY State code. For the sake of the example, NY has two possible values, New York and Nueva York, but it could very well be the same for both entries.
The above result was achieved by applying the following behavior:
- When a same value exists several times in the reference dataset, only the first occurrence is matched, Nueva York in this case.
- An empty cell in your preparation will be matched with an empty cell.