Scenario: Connecting to a web service and returning a list of regular expressions - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance
Data Quality and Preparation
Design and Development

This scenario is a three-component Java Job created in Talend Studio.

This scenario:

  • uses the tFindRegexlibExpression component to connect to a web server and collects all regular expressions that have the word "email" in their description field,

  • uses the tMap component to reorganize the incoming data in the output flow and also to concatenate the two fields from the incoming data flow in one output column,

  • and finally writes all collected expressions in an csv file.

This Job can also be generated automatically from the Patterns > Regex node in the DQ Repository tree view. For further information about how to generate a Job to recuperate regular expressions, see the Talend Studio User Guide.

Configuring the tFindRegexlibExpressions component

  1. Drop the following components from the Palette onto the design workspace: tFindRegexlibExpressions, tMap, and tFileOutputDelimited.

  2. Double-click the tFindRegexlibExpressions component to open its Basic settings view and define its properties.

    The schema of this component is read-only and it contains the following fields: Title, Expression, Description, Matches, Non-Matches, Author, Rating and Relative_path.

  3. In the Regexp Substring field, define a regular expression substring you want to use as a filter on the regular expression list.

  4. In the Key Words field, define the key word(s) you want to use as a filter on the regular expression list.

  5. In the Min Rate field, define a regular expression rating you want to use as a filter on the regular expression list.

  6. In the Relative path field, type in the relative path pointing to the folder to be created in the Patterns > Regex node of the DQ Repository tree view for the retrieved patterns. In this example, this folder is email.

    In this scenario we want tFindRegexlibExpressions to collect all regular expressions on the web server that have the word "email" in their Description field and those which rate is at least 1.

  7. Connect tFindRegexlibExpressions and tMap using a Main row link.

Configuring the tMap component

  1. Double-click the tMap component to open the Map Editor and do necessary fields reorganization and concatenation.

  2. In the Map Editor, click the plus button in the upper-right corner to open a dialog box where you can give a name to the new output table, regex in this scenario.

    This will create a new link in the tMap component holding the same name and that you can use to connect tMap to the next component.

  3. In the lower-right corner of the map Editor, click the plus button to define the fields in the regex output table.

  4. In the upper half of the Map Editor, drop fields from the input table to fill the fields of the output schema as necessary. For more information regarding data mapping, see Talend Studio User Guide.

    In this scenario, we want to concatenate the Matches, and Non-Matches fields from the incoming data flow in one output column: Purpose.We want as well to have a new column in the output schema called Path. And finally, we do not want to have any rating-related information in the output schema.

  5. Click Ok to validate and close the Map Editor.

  6. Right-click tMap and select the regex link to connect tMap to tFileOutputDelimited.

Configuring the output component

  1. Double-click tFileOutputDelimited to display its Basic settings and define its properties.

  2. Click the three-dot button next to the File Name field to browse to the file where you want to write the output data.

  3. Define the row and field separators in the corresponding fields.

  4. Select the Append check box if you want to add the new rows at the end of the records.

  5. Select the Include Header check box to include column headers in the output data.

  6. If needed, click Edit schema to view the input and output data flows.

Job execution

Save your Job an press F6 to execute it.

tFindRegexlibExpressions connects to the web server and collects all regular expressions that match the request, tMap does all defined filed reorganization and concatenation and passes the output flow to tFileOutptdelimited. The output file will look something like the following:

You can later import all collected regular expressions from a well formatted csv file into Talend Studio. for more information about importing patterns, see Talend Studio User Guide.