Glossary - 6.2

Talend MDM Platform Getting Started Guide

EnrichVersion
6.2
EnrichProdName
Talend MDM Platform
task
Administration and Monitoring
Data Governance
Data Quality and Preparation
Installation and Upgrade
EnrichPlatform
Talend Administration Center
Talend CommandLine
Talend DQ Portal
Talend ESB
Talend Installer
Talend MDM Web UI
Talend Runtime
Talend Studio

When working with Talend Studio and in order to understand its functional mechanism, it is important to understand some basic vocabulary.

Generic terms

component

A component is an executable part of a Job or Route used to connect to an external source or perform a specific data integration operation, no matter what data sources you are integrating: databases, applications, flat files, Web services, etc. A component can minimize the amount of hand-coding required to work on data from multiple, heterogeneous sources.

Components are grouped in families according to their usage and displayed in the Palette of the Integration perspective of Talend Studio.

For detailed information about components types and what they can be used for, see Talend Components Reference Guide.

item

An item is the fundamental technical unit in a project. Items are grouped, according to their types, as: Job Design, Business model, Context, Code, Metadata, etc. One item can include other items. For example, the business models and the Jobs you design are items, metadata and routines you use inside your Jobs are items as well.

Job

A Job is a graphical design, of one or more components connected together, that allows you to set up and run dataflow management processes. It translates business needs into code, routines and programs. Jobs address all of the different sources and targets that you need for data integration processes and all other related processes.

Joblet

A Joblet is a specific component that replaces Job component groups. It factorizes recurrent processing or complex transformation steps to ease the reading of a complex Job. Joblets can be reused in different Jobs or several times in the same Job.

metadata

Metadata is information that describes the characteristics of any data object, such as its name, type, location, author, date created, size, and so on, together with relationships with other data objects that the enterprise has to manage or that an IT tool may generate. Metadata can be created manually or automatically by a system.

project

Projects are structured collections of items and their associated metadata. All of the Jobs and business models you design are organized in Projects.

repository

A repository is the storage location Talend Studio uses to gather data related to all of the technical items that you use either to describe business models or to design Jobs.

Talend Studio can connect to as many local or remote repositories as needed.

Route

A Camel Route is a graphical design, based on Apache Camel framework, of two or more components connected together that allows you to set up and run routing and mediation rules. A routing rule defines how messages will be moved from one service (or endpoint) to another.

Service

A Service is a graphical design, of several WSDL objects (service, binding, port type and so on) linked together, that allows you to set up and implement Web services. A Service is associated with one or more data service Jobs as the service provider and can be consumed by consumer Jobs.

service Job

A data service Job is a graphical design, of one or more components connected together, that allows you to set up and run data service processes. It translates business needs into code, routines and programs. Jobs address all of the different sources and targets that you need for data integration processes and combine it with Web services.

Note

Data service Jobs will simply be referred to as Jobs in the following documentation.

workspace

A workspace is the directory where you store all your project folders. You need to have one workspace directory per connection (repository connection). Talend Studio enables you to connect to different workspace directories, if you do not want to use the default one.

Terms in Talend MDM

advanced validation rules

Extension to standard XML Schema to provide more advanced validation rules without programming.

annotation

Gives a description about the metadata that the administrator "attached" to an Entity in the data model.

consumer

Consumes data FROM the MDM Hub. A consumer may also be a provider.

data container

Holds data of one or several business entities. Data containers are typically used to separate master data domains.

data governance

The process of defining the rules that data has to follow within an organization.

data model

Defines the attributes, validation rules, user access rights and relationships of entities mastered by the MDM Hub. The data model is the central component of Talend MDM. A data model maps to a single entity that can be explicitly defined. Any concept can be defined by a data model.

data stewardship

The process of validating master data against the rules (data models) that are set in the Talend Studio.

domain

A collection of data models that define a particular concept. For instance, the customer domain may be defined by the organization, account, contact and opportunity data models. A product domain may be defined by a product, product family and price list. Ultimately, the domain is the collection of all entities (data models) that relate to a concept. Talend MDM can model any and many domains within a single hub. It is a generic multi-domain MDM solution.

entity

Describes the actual data, its nature, its structure and its relationships. A data model can have multiple entities.

Event Manager

A service of the MDM Hub responsible for routing events thrown by the MDM Hub to trigger, evaluate their conditions, execute Processes, and trace active / completed / failed actions for monitoring purpose.

MDM Hub

Defines a complete Talend MDM implementation. It consists of components for Integration, Quality, Master Data Model, an XML DB interface and operational database, Web Services, Roles Based Access Control, Workflow Engine, the Data Stewardship components and MDM Web Interface. The MDM Hub is configured to meet different business needs.

Process

A Process is executed when the condition specified by the corresponding Trigger is verified. A Process may have several "steps", each step performs a specific task such as: update a record in the hub, run a Talend Job, instantiate a workflow etc.

provider

Feeds data IN to the MDM Hub.

record

An instance of data defined by a data model in the MDM Hub. Two records may be compared and considered similar or a close match, in which case the records may be linked and one may or may not survive.

Roles Based Access Control (RBAC)

Defines rules for accessing tasks or hub data depending on the role of the person, system or function accessing it.

Talend Studio

The administration user interface built from Eclipse. It allows the administrator of the system to manage and maintain the MDM Hub and all associated Data Integration Jobs through a single console.

Trigger

Condition for a Process to be executed, based on events thrown by the MDM Hub. Example of a Trigger condition: Agency created and Agency/Revenue > 100. An event may cause more than one Trigger condition to be true, which will result in several Processes to be executed. Triggers are used to specify when specific Processes such as notifications, duplicate checking, records enrichment, propagation to back end systems, approval workflows etc. should be executed.

View

A complete or a subset view of a record. A complete view shows all elements or columns in an entity, while a subset view shows some of the elements or columns in an entity. A View may restrict access to attributes of a record depending on who or what is asking for the data.

Terms in Talend Data Quality

advanced statistics

Indicators which determine the most probable and the most frequent values and build frequency tables.

Benford Law Frequency

An indicator based on examining the actual frequency of the digits 1 through 9 in numerical data. It is usually used as an indicator of accounting and expenses fraud in lists or tables.

data profiling

The process of examining the data available in different data sources and collecting statistics and information about this data. Data profiling helps to assess the quality level of the data according to a defined goal.

Data Quality Portal

A web-based platform that shares the results of the analyses and further exploits them.

It provides advanced reporting and allows to compare current and historical statistics to determine the improvement or degradation of your data.

indicators

Results achieved through the implementation of complex analyses about data matching and other data-related operations.

They fall into two categories: "system indicators" or "user defined indicators".

patterns

Sets of strings against which you can define the content, structure and quality of highly complex data.

They fall into two categories: "regular expressions" or "SQL patterns".

pattern frequency statistics

Indicators which determine the most and less frequent patterns in a data set.

phone number statistics

Indicators which count phone numbers. They return the count for each phone number format. They validate the phone formats using the org.talend.libraries.google.libphonumber library.

regular expressions (regex)

Predefined patterns that you can use to search and manipulate data in databases.

report

A document you can generate on one or more analyses from the Profiling perspective of the Studio to provide the statistics collected by the analyses. You can generate reports in different formats.

simple statistics

Indicators which provide simple statistics on the number of records falling in certain categories including the number of rows, the number of null values, the number of distinct and unique values, the number of duplicates, or the number of blank fields.

soundex frequency statistics

Indicators which use the Soundex algorithm built in the DBMS. They index records by sounds. This way, records with the same pronunciation (only English pronunciation) are encoded to the same representation so that they can be matched despite minor differences in spelling.

SQL patterns

Personalized patterns which you can use in SQL queries. These patterns usually contain the percent sign (%).

summary statistics

Indicators which perform statistical analyses on numeric data, including the computation of location measures such as the median and the average, the computation of statistical dispersions such as the inter quartile range and the range.

text statistics

Indicators which analyze the characteristics of textual fields in the columns, including minimum, maximum and average length.