Configuring the components - 7.3

Data extraction

EnrichVersion
Cloud
7.3
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance > Third-party systems > Data Quality components > Data extraction components
Data Quality and Preparation > Third-party systems > Data Quality components > Data extraction components
Design and Development > Third-party systems > Data Quality components > Data extraction components

Procedure

  1. Double-click tFileInputDelimited to display its Basic settings view and define the component properties, including the input file name, the number of header rows to skip, and the schema.
    1. Click the Edit Schema to create the schema.
    2. Click [+] to add a Name__Telephone_Address to the schema and click OK to validate.
  2. Double-click tPatternExtract to display its Basic settings view and define the component properties.
    1. From the Column to check list, select the column you want to check its data against the defined pattern, Name_Telephone_Address in this example.
    2. From the PROPERTY list, select Repository to check the data against a pattern from the DQ Repository.
    3. Click the [...] button next to the PROPERTY field and select Regex > internet > Email Address from the Pattern Selector.
  3. In the Basic settings view of the tFilterColumns component, click the [...] button next to Edit schema to open the Schema dialog box.
  4. Select the column of interest from the Input schema, and click the right arrow button to copy the column to the output schema. Then, click OK to close the dialog box.
  5. Double-click tFileOutputDelimited to display its Basic settings view and define the component properties.
    1. In the File Name field, specify the path to the file you want to write the output data to.
    2. Define the row and field separators in the corresponding fields, if any. In this example, we want to separate customers' email addresses by semicolons.