Skip to main content Skip to complementary content
Close announcements banner

Running the preparation to update the source dataset

Availability-noteBeta
You need to send the fixed data from the preparation to the original dataset in order to update it.

But because of the splitting function that you used before, you will have to complete a mapping step to reconcile the schema of the preparation and the schema of the destination dataset coming from the database.

After running the preparation, you will be able to see the impact of the preparation on the different quality indicators.

Procedure

  1. Click the Run button on the top right of the screen to open the export options.
  2. Select customers_billing_dataset, which is the source dataset that you want to update, as Destination.
  3. Select Update from the Action drop-down list, so that the wrong records from the database are replaced with the ones from the preparation.
  4. Select Customer_id as column in the Operation keys drop-down list.
  5. Click Next.
  6. Use drag and drop to perform the following mappings between the resulting schema of the preparation, and the schema from the destination dataset:
    1. Customer_id with Customer_id
    2. Billing_Country_Split_1 with Billing_Street
    3. Billing_Country_Split_2 with Billing_City
    4. Billing_Country_Split_3 with Billing_State
    5. Billing_Country_Split_4 with Billing_country
    See Mapping the preparation and destination columns for more information on how to map columns.
    Mapping configuration between input and output columns.
  7. Click Next.
  8. Select Standard as run profile, so that the preparation runs on the Cloud Engine for Design.
  9. Click Run.
    The run starts in the background, and you are now back to the preparation screen.
  10. To check the status of the run, click the Run history button on the top right of the screen.
    Run history panel showing metrics and status of the run.
    This screen gives you various information about the current and past runs, for more information, see the The run history page.
  11. Once the run is complete and successful, click customers_billing_dataset under the Destination dataset section to directly go back to the detailed view of the updated dataset.
  12. In the Data quality tile, click Select sample type > Refresh head sample in order to retrieve the latest changes made to the content of the database.

Results

After refreshing, you can see that the Talend Trust Score™ of the dataset has significantly increased, as indicated by the differential next to the score itself.
Trust Score icon showing a 1.05 points increase.

Using Talend Cloud Data Inventory and Talend Cloud Data Preparation has allowed you to monitor the datasets of your whole organization, use different indicators to identify potential errors, and fix them accordingly, to improve the health of your data.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!