Finalizing and execute the analysis of a set of columns - Cloud

Talend Cloud API Services Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development
EnrichPlatform
Talend Management Console
Talend Studio

What is left before executing this set of columns analysis is to define the indicator settings, data filter and analysis parameters.

Before you begin

A column set analysis has already been defined in the Profiling perspective of the Talend Studio. For further information, see Defining the set of columns to be analyzed and Adding patterns to the analyzed columns.

Procedure

  1. In the Analysis Parameters view:
    • In the Number of connections per analysis field, set the number of concurrent connections allowed per analysis to the selected database connection.

      You can set this number according to the database available resources, that is the number of concurrent connections each database can support.

    • From the Execution engine list, select the engine, Java or SQL, you want to use to execute the analysis.
      • If you select the Java engine, the Store data check box is selected by default and cannot be unselected. Once the analysis is executed, the profiling results are always available locally to drill down through the Analysis Results > Data view. For further information, see Filtering data against patterns.

        Executing the analysis with the Java engine uses disk space as all data is retrieved and stored locally. If you want to free up some space, you may delete the data stored in the main Talend Studio directory, at Talend-Studio/workspace/project_name/Work_MapDB.

      • If you select the SQL engine, you can use the Store data check box to decide whether to store locally the analyzed data and access it in the Analysis Results > Data view.
        Note: If the data you are analyzing is very big, it is advisable to leave the Store data check box unselected in order not to store the results at the end of the analysis computation.
  2. Save the analysis and press F6 to execute it.

    The analysis editor switches to the Analysis Results view where you can read the analysis results in tables and graphics. The graphical result provides the simple statistics on the full records of the analyzed column set and not on the values within each column separately.

    When you use patterns to match the content of the set of columns, another graphic is displayed to illustrate the match and non-match results against the totality of the used patterns.

  3. In the Simple Statistics table, right-click an indicator result and select View Rows or View Values.
    • If you run the analysis with the Java engine, a list of the analyzed data is opened in the Profiling perspective.
    • If you run the analysis with the SQL engine, a list of the analyzed data is opened in the Data Explorer perspective.
  4. In the All Match table, right-click the result row and select Generate an ETL job to handle rows.
    A ready-to-use Job is generated and opened in the Integration perspective. This Job will group the valid/invalid rows and write them in two separate files. For further information, see Recuperating matching and non-matching rows.
    Note: The All Match table is available only when you run the analysis with the Java engine.
  5. In the Data view, click Filter Data to filter the valid/invalid data according to the used patterns.
    You can filter data only when you run the analysis with the Java engine. For further information, see Filtering data against patterns.