tCouchbaseInput Standard properties - 7.3

Couchbase

author
Talend Documentation Team
EnrichVersion
Cloud
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Database components > Couchbase components
Data Quality and Preparation > Third-party systems > Database components > Couchbase components
Design and Development > Third-party systems > Database components > Couchbase components
EnrichPlatform
Talend Studio

These properties are used to configure tCouchbaseInput running in the Standard Job framework.

The Standard tCouchbaseInput component belongs to the Databases NoSQL family.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Bootstrap nodes

Enter the name or IP of the node to be bootstrapped by Couchbase SDK. As Couchbase recommends to specify multiple nodes to bootstrap, enter the names or IPs of these nodes in this field, separating them using commas (,).

For further information about Couchbase bootstrapping, see How Couchbase SDKs connect to the cluster.

You can find the node names on the Servers page in your Couchbase Web Console. If you need further information, contact the administrator of your Couchbase cluster or consult your Couchbase documentation.

Note that the Couchbase servers do not support proxies; for this reason, the Couchbase components from Talend do not support proxies either.

Username and Password

Provide the authentication credentials to your Couchbase cluster.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

If you are using Couchbase V5.0 and onwards, enter the same value you put in the Bucket field as password, because since Couchbase V5.0, no password is associated with a bucket. However, on Couchbase, you need to create a user with appropriate role to access the buckets.

For further information about the access control and other important requirements on the Couchbase side, see Couchbase release note of your version.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

When using non-JSON documents, define an id column of the String type, then define a content column. The type of this content column should be String for the string documents and byte[] for the binary documents.

When it comes to JSON documents, define the fields that exist in your JSON documents.

Bucket

Enter, within double quotation marks, the name of the data bucket in the Couchbase database.

Ensure that the credentials you are using have the appropriate rights and permissions to access this bucket.

If you are using Couchbase V5.0 and onwards, this bucket name is the user name you have created in the Security tab of your Couchbase UI.

Document type

Data stored in a Couchbase database could be JSON, strings or binary. From this drop-down list, select the type of the data you need to use with Couchbase.

Note that it is not recommended to mix JSON, binary and string documents in a same bucket, as this mixture could make the document processing error-prone.

If you need to use N1QL to query string or binary documents, the only possible way is to use the document ID to get the document. For example, if you need to get a document for which the ID number is 2, the N1QL query should be
SELECT meta().id as `_meta_id_` FROM `bucket_name` where meta().id = '2';

Note that the quotations marks around _meta_id_ and bucket_name are backticks (`).

Use N1QL query

Select this check box and in the Query field that is displayed, enter a N1QL query statement to perform complex actions.

Only one statement is allowed. Do not put quotation marks around the statement.

  • When you use wildcards in your query such as SELECT *, the returned result of this query is wrapped in the bucket name used in this query. In this situation, define only one column for the result in the schema of this component.

    For example, when performing this query
    SELECT * FROM `travel_sample` limit 3
    The returned result is wrapped in the travel_sample bucket, reading like this:
    [
      {
        "travel_sample": {
          "callsign": "MILE-AIR",
          "country": "United States",
          "iata": "Q5",
          "icao": "MLA",
          "id": 10,
          "name": "40-Mile Air",
          "type": "airline"
        }
      },
      {
        "travel_sample": {
          "callsign": "TXW",
          "country": "United States",
          "iata": "TQ",
          "icao": "TXW",
          "id": 10123,
          "name": "Texas Wings",
          "type": "airline"
        }
      },
      {
        "travel_sample": {
          "callsign": "atifly",
          "country": "United States",
          "iata": "A1",
          "icao": "A1F",
          "id": 10226,
          "name": "Atifly",
          "type": "airline"
        }
      }
    ]

    In the schema, define one single column called, for example, travel_sample to store the result and select String as its type.

  • If you use a query without wildcards, such as
    SELECT callsign, country, iata, icao, id, name, type FROM `travel_sample` limit 3;
    The returned result is not wrapped, reading like this:
    [
      {
        "callsign": "MILE-AIR",
        "country": "United States",
        "iata": "Q5",
        "icao": "MLA",
        "id": 10,
        "name": "40-Mile Air",
        "type": "airline"
      },
      {
        "callsign": "TXW",
        "country": "United States",
        "iata": "TQ",
        "icao": "TXW",
        "id": 10123,
        "name": "Texas Wings",
        "type": "airline"
      },
      {
        "callsign": "atifly",
        "country": "United States",
        "iata": "A1",
        "icao": "A1F",
        "id": 10226,
        "name": "Atifly",
        "type": "airline"
      }
    ]

    In this situation, define the columns that represent the structure of the actual business data, such as the following columns, in the component schema: callsign, country, iata, icao, id, name and airline.

Advanced settings

Connect Timeout Enter, without quotation marks, the timeout interval (in seconds) for the connection to be aborted.
Limit rows Enter the maximum number of rows to be read. This field is not available when you use a N1QL query.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Usage rule

As a start component, tCouchbaseInput reads the documents from the Couchbase database.