Skip to main content

Standardizing data

Standardizing data before trying to perform matching tasks is an essential step to improve matching accuracy.
Talend provides different ways to standardize data:
  • You can standardize data against indices. Synonyms are standardized or converted to the "master" words.

    For more information on available data synonym dictionaries, see the Talend Studio User Guide.

  • You can use address validation components to standardize address data against Experian QAS, Loqate and MelissaData validation tools. The addresses returned by these tools are consistent and variations in address representations are eliminated. As addresses are standardized, matching gets easier.

    For more information on the tQASBatchAddressRow, tLoqateAddressRow and tMelissaDataAddress components, see Address standardization.

    For more information on address validation components, see the online publication about the tQASBatchAddressRow, tLoqateAddressRow and tMelissaDataAddress components on Talend Help Center (https://help.talend.com).

  • You can use the tStandardizePhoneNumber component to standardize a phone number, based on the formatting convention of the country of origin.

    For more information on phone number standardization, see Phone number standardization.

    For more information on phone number standardization, see the online publication about the tStandardizePhoneNumber component on Talend Help Center (https://help.talend.com).

  • You can use other more generic components to transform your data and get more standardized records, such as tReplace, tReplaceList, tVerifyEmail, tExtractRegexFields or tMap.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!