Skip to main content

Standardizing data

Standardizing data before trying to perform matching tasks is an essential step to improve matching accuracy.
Talend provides different ways to standardize data:
  • You can standardize data against indices. Synonyms are standardized or converted to the "master" words.

    For more information on available data synonym dictionaries, see the Data synonym dictionaries.

  • You can use address validation components to standardize address data against Experian QAS, Loqate and MelissaData validation tools. The addresses returned by these tools are consistent and variations in address representations are eliminated. As addresses are standardized, matching gets easier.

    For more information on the tQASBatchAddressRow, tLoqateAddressRow and tMelissaDataAddress components, see Address standardization.

  • You can use the tStandardizePhoneNumber component to standardize a phone number, based on the formatting convention of the country of origin.

    For more information on phone number standardization, see Phone number standardization.

  • You can use other more generic components to transform your data and get more standardized records, such as tReplace, tReplaceList, tVerifyEmail, tExtractRegexFields or tMap.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!