Using the JOIN clause - Cloud - 8.0

Talend Data Shaping Language Reference Guide

Version
Cloud
8.0
Language
English
Product
Talend Cloud
Module
Data Shaping Language
Content
Design and Development
Last publication date
2023-11-23

The JOIN clause is used to join two arrays based on a condition.

The JOIN clause should be used after a FROM or an UNNEST clause. The JOIN keyword is followed by an expression returning an array and the ON keyword that introduces the expression to use as the condition:
JOIN expression ON condition_expression 
The condition should be a simple or conditional expression that returns a Boolean. For example:
FROM customers AS c
JOIN orders AS o ON c.custid == o.custid

The JOIN clause can contain the AS and INDEX keywords described in Using the FROM clause.

By default, the JOIN keyword performs an inner join, but you can specify the type of join to use:
  • The INNER JOIN clause only joins items that match the condition in both arrays.
  • The LEFT OUTER JOIN returns all items from the array in the FROM or UNNEST clause and joins the items from the array in the JOIN clause that match the condition.
For example, with the following input:
{
    "customers": [
        {
            "custid": "1234",
            "name": "Jack Smith"
        },
        {
            "custid": "5678",
            "name": "Jane Smith"
        },
        {
            "custid": "9874",
            "name": "John Doe"
        }
    ],
    "orders": [
        {
            "orderid": "abc-6511",
            "custid":"1234"
        },
        {
            "orderid": "def-6872",
            "custid":"5678"
        }
    ]
}
The same query with a different join type returns different results:
  • The INNER JOIN returns only the customers associated with an order:
    FROM customers AS c
    INNER JOIN orders AS o ON c.custid == o.custid
    SELECT {
      name,
      orderid
    }
    [
       {
          "name":"Jack Smith",
          "orderid":"abc-6511"
       },
       {
          "name":"Jane Smith",
          "orderid":"def-6872"
       }
    ]
  • The LEFT OUTER JOIN returns all customers and ignores the orderid element when no order is associated with the customer:
    FROM customers AS c
    LEFT OUTER JOIN orders AS o ON c.custid == o.custid
    SELECT {
      name,
      orderid
    }
    [
       {
          "name":"Jack Smith",
          "orderid":"abc-6511"
       },
       {
          "name":"Jane Smith",
          "orderid":"def-6872"
       },
       {
          "name":"John Doe"
       }
    ]
Note: In this example, the elements used in the condition have the same name in both arrays. To avoid issues with relative paths, you need to either define an alias for each array, or use absolute paths to refer to the element. For example:
INNER JOIN orders ON input.customers.custid == input.orders.custid