Skip to main content Skip to complementary content

Data Profiling and Data Quality

What is Talend Data Quality?

Talend Studio is a comprehensive data quality and data management solution that comprises several main elements:
  • The Profiling and Data Explorer perspectives where you can analyze data and browse and query analysis results.
  • The Integration perspective where you have access to a set of components and routines dedicated to data quality. This enables you to embed data cleansing capabilities in the data transformation/integration processes.
  • From the Integration perspective, you have access to hundreds of components for all data integration needs including many data quality components that are used to cleanse data.

For detailed information about data quality specific components, see Data Quality components.

This feature is not shipped with Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.

Core features

Metadata repository

Using Talend data quality, you can connect to data sources to analyze their structure (catalogs, schemas, and tables), and stores the description of their metadata in its metadata repository. You can then use this metadata to set up metrics and indicators.

For more information, see Creating connections to data sources.

One specific feature of interest as well is a report database where you can keep a history of created reports and share results among team members. For more information, see Managing the report database.

Patterns and indicators

Patterns are sets of strings against which you can define the content, structure, and quality of high complex data. The Profiling perspective of Talend Studio lists two types of patterns:
  • Regular expressions which are predefined regular patterns.
  • SQL patterns which are the patterns you add using LIKE clauses.

    For more information about patterns, see Patterns.

Indicators are the results achieved through the implementation of different patterns. They can represent the results of data matching and different other data-related operations. The Profiling perspective of Talend Studio lists two types of indicators:
  • System indicators, a list of predefined indicators.
  • User-defined indicators, a list of those defined by the user.

    For more information about indicators, see Indicators.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!