tCouchbaseInput Standard properties - 7.3

Couchbase

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > NoSQL components > Couchbase components
Data Quality and Preparation > Third-party systems > NoSQL components > Couchbase components
Design and Development > Third-party systems > NoSQL components > Couchbase components

These properties are used to configure tCouchbaseInput running in the Standard Job framework.

The Standard tCouchbaseInput component belongs to the Databases NoSQL family.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Bootstrap nodes

Enter the name or IP of the node to be bootstrapped by Couchbase SDK. As Couchbase recommends to specify multiple nodes to bootstrap, enter the names or IPs of these nodes in this field, separating them using commas (,).

For further information about Couchbase bootstrapping, see How Couchbase SDKs connect to the cluster.

You can find the node names on the Servers page in your Couchbase Web Console. If you need further information, contact the administrator of your Couchbase cluster or consult your Couchbase documentation.

Note that the Couchbase servers do not support proxies; for this reason, the Couchbase components from Talend do not support proxies either.

Username and Password

Provide the authentication credentials to your Couchbase cluster.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

If you are using Couchbase V5.0 and onwards, enter the same value you put in the Bucket field as password, because since Couchbase V5.0, no password is associated with a bucket. However, on Couchbase, you need to create a user with appropriate role to access the buckets.

For further information about the access control and other important requirements on the Couchbase side, see Couchbase release note of your version.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

When using non-JSON documents, define an id column of the String type, then define a content column. The type of this content column should be String for the string documents and byte[] for the binary documents.

When it comes to JSON documents, define the fields that exist in your JSON documents.

Bucket

Enter, within double quotation marks, the name of the data bucket in the Couchbase database.

Ensure that the credentials you are using have the appropriate rights and permissions to access this bucket.

If you are using Couchbase V5.0 and onwards, this bucket name is the user name you have created in the Security tab of your Couchbase UI.

Document type

Data stored in a Couchbase database could be JSON, strings or binary. From this drop-down list, select the type of the data you need to use with Couchbase.

Note that it is not recommended to mix JSON, binary and string documents in a same bucket, as this mixture could make the document processing error-prone.

If you need to use N1QL to query string or binary documents, the only possible way is to use the document ID to get the document. For example, if you need to get a document for which the ID number is 2, the N1QL query should be
SELECT meta().id as `_meta_id_` FROM `bucket_name` where meta().id = '2';

Note that the quotations marks around _meta_id_ and bucket_name are backticks (`).

Query Type
Select the type of queries to be used from the following options:
  • Select All: select all the contents of a given bucket.
  • N1QL: use an N1QL statement to perform fine-tuned queries.
  • N1QL for Analytics: use an N1QL for Analytics statement to perform queries. N1QL for Analytics is a language for querying semistructured data.
  • Document ID: use the document IDs to select documents. You need to enter the ID to be used in theDocument ID field that is displayed. Only one document ID is allowed per component.
Note:
  • The N1QL for Analytics option is available only when you have installed the R2021-08 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
  • See N1QL for Analytics Language Reference for information about N1QL for Analytics.
  • See N1QL for Analytics vs. N1QL for Query for a comparison between N1QL for Analytics and N1QL for Query.
Query

Enter an N1QL query statement or an N1QL for Analytics statement to perform complex actions.

Only one statement is allowed. Do not put quotation marks around the statement.

  • When you use wildcards in your query such as SELECT *, the returned result of this query is wrapped in the bucket name used in this query. In this situation, define only one column for the result in the schema of this component.

    For example, when performing this query
    SELECT * FROM `travel_sample` limit 3
    The returned result is wrapped in the travel_sample bucket, reading like this:
    [
      {
        "travel_sample": {
          "callsign": "MILE-AIR",
          "country": "United States",
          "iata": "Q5",
          "icao": "MLA",
          "id": 10,
          "name": "40-Mile Air",
          "type": "airline"
        }
      },
      {
        "travel_sample": {
          "callsign": "TXW",
          "country": "United States",
          "iata": "TQ",
          "icao": "TXW",
          "id": 10123,
          "name": "Texas Wings",
          "type": "airline"
        }
      },
      {
        "travel_sample": {
          "callsign": "atifly",
          "country": "United States",
          "iata": "A1",
          "icao": "A1F",
          "id": 10226,
          "name": "Atifly",
          "type": "airline"
        }
      }
    ]

    In the schema, define one single column called, for example, travel_sample to store the result and select String as its type.

  • If you use a query without wildcards, such as
    SELECT callsign, country, iata, icao, id, name, type FROM `travel_sample` limit 3;
    The returned result is not wrapped, reading like this:
    [
      {
        "callsign": "MILE-AIR",
        "country": "United States",
        "iata": "Q5",
        "icao": "MLA",
        "id": 10,
        "name": "40-Mile Air",
        "type": "airline"
      },
      {
        "callsign": "TXW",
        "country": "United States",
        "iata": "TQ",
        "icao": "TXW",
        "id": 10123,
        "name": "Texas Wings",
        "type": "airline"
      },
      {
        "callsign": "atifly",
        "country": "United States",
        "iata": "A1",
        "icao": "A1F",
        "id": 10226,
        "name": "Atifly",
        "type": "airline"
      }
    ]

    In this situation, define the columns that represent the structure of the actual business data, such as the following columns, in the component schema: callsign, country, iata, icao, id, name and airline.

Note: This field is available when you select N1QL or N1QL Analytics from the Query type drop-down list.

Advanced settings

Connect Timeout Enter, without quotation marks, the timeout interval (in seconds) for the connection to be aborted.
Limit rows Enter the maximum number of rows to be read. This field is not available when you use a N1QL query.
Create primary index Select this check box to let the component create a primary index of your database, or create a new primary index if one already exists.

By default, this check box is cleared, as the creation of primary index can be optionnal if secondary index exists.

For more information about primary index, see Create primary index from the official Couchbase documentation.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Usage rule

As a start component, tCouchbaseInput reads the documents from the Couchbase database.