Filtering and aggregating the input data - 7.3

CombinedSQL

author
Talend Documentation Team
EnrichVersion
Cloud
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > ELT components > CombinedSQL components
Data Quality and Preparation > Third-party systems > ELT components > CombinedSQL components
Design and Development > Third-party systems > ELT components > CombinedSQL components
EnrichPlatform
Talend Studio

Procedure

  1. In the design workspace, select tCombinedSQLFilter and click the Component tab to access the configuration panel.
  2. Click the Sync columns button to retrieve the schema from the previous component, or configure the schema manually by selecting Built-in from the Schema list and clicking the [...] button next to Edit schema.
    When you define the data structure for tCombinedSQLFilter, column names automatically appear in the Input column list in the Conditions table.
    In this scenario, the tCombinedSQLFilter component instantiates four columns: id, state, date_of_birth, and salary.
  3. In the Conditions table, set input parameters, operators and expected values in order to only extract the records that fulfill these criteria.
    Click two times on the [+] button under the Conditions table, and in Input column, select state and date_of_birth from the drop-down list.
    In this scenario, the tCombinedSQLFilter component filters the state and date_of_birth columns in the source table to extract the employees who were born after Oct. 19, 1960 and who live in the states Utah, Ohio and Iowa.
  4. For the column state, select IN as operator from the drop-down list, and enter ('Utah','Ohia','Iowa') as value.
  5. For the column date_of_birth, select > as operator from the drop-down list, and enter ('1960-10-19') as value.
  6. Select And in the Logical operator between conditions list to apply the two conditions at the same time. You can also customize the conditions by selecting the Use custom SQL box and editing the conditions in the code box.
  7. In the design workspace, select tCombinedSQLAggregate and click the Component tab to access the configuration panel.
  8. Click on the [...] button.next to Edit schema to enter the following configuration:
    The tCombinedSQLAggregate component instantiates four columns: id, state, date_of_birth, and salary, coming from the previous component.
  9. The Group by table helps you define the data sets to be processed based on a defined column. In this example: State.
    In the Group by table, click the [+] button to add one line.
  10. In the Output column drop-down list, select State. This column will be used to hold the data filtered on State.
  11. The Operations table helps you define the type of aggregation operations to be performed. The Output column list available depends on the schema you want to output (through the tCombinedSQLOutput component). In this scenario, we want to group employees based on the state they live in. Then we want to count the number of employees per state, calculate the average/lowest/highest salaries as well as the oldest/youngest employees for each state.
  12. In the Operations table, click the [+] button to add a line and then click in the Output column list to select the output column that will hold the computed data.
  13. In the Function field, select the relevant operation to be carried out.