How to analyze repository items - 6.2

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Talend Studio provides you with advanced capabilities for analyzing any given item, or even a Job, in the Repository tree view. This implies two forms of navigation: moving forward to discover descendant items up to the target component (Impact Analysis) and moving backward to discover the ancestor items starting with the source component (Data Lineage). The results of the analysis will determine where data comes from, how it is transformed, and where it is going or vice versa.

Warning

All items on which you want to execute impact analysis or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblet Designs, Contexts, SQL Templates, Referenced project or Metadata.

Impact analysis

Impact analysis helps to identify all the Jobs that use any of the items centralized in the Repository tree view and that will be impacted by a change in the parameters of a repository item.

Impact analysis also analyzes the data flow in each of the listed Jobs to show all the components and stages the data flow passes through and the transformation done on data from the source component up to the target component.

Talend Studio also allows you to produce detail documentation in HTML and XML of the results of the impact analysis. For more information, see How to export the results of impact analysis/data lineage to HTML and How to export the results of impact analysis/data lineage to XML.

Warning

All items on which you want to execute impact analysis or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblet Designs, Contexts, SQL Templates, Referenced project or Metadata.

The example below shows an impact analysis done on a database connection item stored under the Metadata node in the Repository tree view.

To analyze data flow in each of the listed Jobs from the source component up to the target component, complete the following:

  1. In the Repository tree view, expand Metadata and browse to the metadata entry you want to analyze, employees under the DB connection mysql in this example.

  2. Right-click the entry you want to analyze and select Impact Analysis.

    A progress bar indicates the process of checking for all Jobs that use the modified metadata parameter. The [Impact Analysis] view appears in the Studio to list all Jobs that use the selected metadata entry. The names of the selected database connection and table schema are displayed in the corresponding fields.

    Note

    You can also open this view if you select Window > Show View > Talend > Impact Analysis.

  3. Right-click any of the listed Jobs and select:

    Select...

    To...

    Open Job

    open the corresponding Job in the Studio workspace.

    Expand/Collapse

    expand/collapse all the items included in the selected Job.

    Thus, you have an outline of the Jobs that use the selected metadata entry.

  4. From the Column list, select the column name for which you want to analyze the data flow from the data source (input component), through various components and stages, to the data destination (output component), Name in this example.

    Note

    The Last version check box is selected by default. This option allows you to select the last version of your Job instead of displaying all versions of your Job in the analysis results.

  5. Click Analysis....

    A bar displays to indicate the progress of the analysis operation and the analysis results display in the view.

Note

Alternatively, you can directly right-click a particular column in the Repository tree view and select Impact Analysis from the contextual menu to display the analysis results regarding that column in the [Impact Analysis] view.

The impact analysis results trace the components and transformations the data in the source column Name passes through before being written in the output column Name.

Data lineage

Data lineage shows the data flow from the data destination (output component), through various components and stages, to the data source (input component). The data lineage results trace the life cycle of the data flow between different components, including the operations that are performed upon the data.

Talend Studio also allows you to produce detail documentation in HTML and XML of the results of the data lineage. For more information, see How to export the results of impact analysis/data lineage to HTML and How to export the results of impact analysis/data lineage to XML.

Warning

All items on which you want to execute impact analysis or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblets Designs, Contexts, SQL Templates, Reference project or Metadata.

The example below shows the data lineage made on a database connection item stored under the Metadata node in the Repository tree view.

To launch a data lineage on a metadata item, complete the following:

  1. In the Repository tree view, expand Metadata > Db Connection and then expand the database connection you want to analyze, mysql in this example.

  2. Right-click the centralized table schema of which you want to analyze the life cycle of the data flow, employees in this example.

    The Impact Analysis view displays the Jobs that use the selected table schema. The names of the selected database connection and table schema are displayed in the corresponding fields.

  3. From the Column list, select the column name for which you want to analyze the data flow from the data destination (output component), through various components and stages, to the data source (input component). The column to be analyzed in this example is called Name.

    You can skip this step by right-clicking the column Name in the Repository tree view and selecting Impact Analysis from the contextual menu.

  4. Click Data Lineage.

    A bar appears to indicate the progress of the analysis operation and the analysis results are displayed in the view.

  5. Right-click a listed Job and select Open Job from the contextual menu.

    The Job opens in the design workspace.

    The data lineage results trace backward the components and transformations the data in the output column Name passes through before being written in this column.

How to export the results of impact analysis/data lineage to HTML

Talend Studio allows you to produce detailed documentation in HTML of the results of the impact analysis or data lineage done on the selected repository element. This documentation offers information related to the Jobs that use this repository element including: project and author detail, project description and a preview of the graphical results of the analysis done on the impacted Jobs.

To generate an HTML document of an impact analysis or data lineage with customization, complete the following:

  1. After you analyze a given repository item as outlined in Impact analysis or Data lineage and in the Impact Analysis view, click the Export to HTML button.

    The [Generate Documentation] dialog box opens.

  2. Enter the path to where you want to store the generated documentation archive or browse to the desired location and then give a name for this HTML archive.

  3. Select the Custom CSS template to export check box to activate the CSS File field if you need use your own CSS file to customize the exported HTML files. The destination folder for HTML will contain the html file, a css file, an xml file and a pictures folder.

  4. Click Finish to validate the operation and close the dialog box.

    An archive file that contains all required files along with the HTML output file is created in the specified path.

  5. Double-click the HTML file in the generated archive to open it in your favorite browser.

    The figure below illustrates an example of a generated HTML file.

    Note

    You can also set CSS customization as a preference for exporting HTML. To do this, see Documentation preferences (Talend > Documentation).

The archive file gathers all generated documents including the HTML that gives a description of the project that holds the analyzed Jobs in addition to a preview of the analysis graphical results.

How to export the results of impact analysis/data lineage to XML

Talend Studio also allows you to export the results of the impact analysis or data lineage done on the selected repository element to an XML document. This tree-structured documentation can be processed by automated analytical applications for Job analysis and reporting purposes.

To generate an XML document of the results of impact analysis or data lineage on the selected a repository item, complete the following:

  1. After you analyze a given repository item as outlined in Impact analysis or Data lineage and in the Impact Analysis view, click the Export to XML button.

    The [Generate XML] dialog box appears.

  2. Enter the path to where you want to store the generated XML document or browse to the desired location and then give a name for this XML file.

  3. Select the Overwrite existing files without warning check box to suppress the warning message if the specified filename already exists.

  4. Click Finish to validate the operation and close the dialog box.

    An XML file that contains the impact analysis or data lineage information is created in the specified path.

    The figure below illustrates an example of a generated XML file, opened in a text editor.