tCosmosDBBulkLoad Standard properties - 7.0

CosmosDB

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Database components > CosmosDB components
Data Quality and Preparation > Third-party systems > Database components > CosmosDB components
Design and Development > Third-party systems > Database components > CosmosDB components
EnrichPlatform
Talend Studio

These properties are used to configure tCosmosDBBulkLoad running in the Standard Job framework.

The Standard tCosmosDBBulkLoad component belongs to the Cloud and the Databases families.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

MongoDB directory

Fill in this field with the MongoDB home directory.

Use replica set address or multiple query routers

Select this check box to show the Server addresses table.

In the Server addresses table, define the sharded MongoDB databases or the MongoDB replica sets you want to connect to.

Server and Port

Enter the IP address and listening port of the database server.

Available when the Use replica set address or multiple query routers check box is not selected.

Database

Enter the name of the MongoDB database to be connected to.

Collection

Type in the name of the collection to import data to.

Drop collection if exist

Select this check box to remove the collection if it already exists.

Authentication mechanism

Among the mechanisms listed on the Authentication mechanism drop-down list, the NEGOTIATE one is recommended if you are not using Kerberos, because it automatically select the authentication mechanism the most adapted to the MongoDB version you are using.

For details about the other mechanisms in this list, see MongoDB Authentication from the MongoDB documentation.

Set Authentication database

If the username to be used to connect to MongoDB has been created in a specific Authentication database of MongoDB, select this check box to enter the name of this Authentication database in the Authentication database field that is displayed.

For further information about the MongoDB Authentication database, see User Authentication database.

Username and Password

DB user authentication data.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Available when the Required authentication check box is selected.

If the security system you have selected from the Authentication mechanism drop-down list is Kerberos, you need to enter the User principal, the Realm and the KDC server fields instead of the Username and the Password fields.

Data file

Type in the full path of the file from which the data will be imported or click the [...] button to browse to the desired data file.

Make sure that the data file is in standard format. For example, the fields in CSV files should be separated with commas.

File type

Select the proper file type from the list. CSV, TSV and JSON are supported.

The JSON file starts with an array

Select this check box to allow tCosmosDBBulkload to read the JSON files starting with an array.

This check box appears when the File type you have selected is JSON.

Action on data

Select the action that you want to perform on the data.

  • Insert: Insert the data into the database.

    Note that when inserting data from CSV or TSV files into the MongoDB database, you need to specify fields either by selecting the First line is header check box or defining them in the schema.

  • Upsert: Insert the data if they do not exist or update the existing data.

    Note that when upserting data into the MongoDB database, you need to specify a list of fields for the query portion of the upsert operation.

Upsert fields

Customize the fields that you want to upsert as needed.

This table is available when you select Upsert from the Action on data list.

First line is header

Select this check box to use the first line in CSV or TSV files as a header.

This check box is available only when you select CSV or TSV from the File type list.

Ignore blanks

Select this check box to ignore the empty fields in CSV or TSV files.

This check box is available only when you select CSV or TSV from the File type list.

Print log

Select this check box to print logs.

Advanced settings

Additional arguments

Complete this table to use the additional arguments as required.

For example, you can use the argument "--jsonArray" to accept the import of data expressed with multiple MongoDB documents within a single JSON array. For more information about the additional arguments, go to http://docs.mongodb.org/manual/reference/program/mongoimport/ and read the description of options.

tStatCatcher Statistics

Select this check box to collect the log data at a component level.

Usage

Usage rule

This component can be used together with the tCosmosDBInput component to verify if the data is imported as expected.

Limitation

The MongoDB client tool needs to be installed on the machine where Jobs using this component are executed.