tSalesforceOutputBulk - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The tSalesforceOutputBulk and tSalesforceBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used to feed the Salesforce database. These two steps are fused together in the tSalesforceOutputBulkExec component. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.

Function

tSalesforceOutputBulk generates files in suitable format for bulk processing.

Purpose

Prepares the file to be processed by tSalesforceBulkExec for executions in Salesforce.com.

tSalesforceOutputBulk Properties

Component family

Business/Cloud

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

 

Bulk File Path

Specify the path to the file to be generated.

 

Append

Select this check box to append new data at the end of the file if it already exists, instead of overwriting the existing data.

 

Ignore Null

Select this check box to ignore NULL values.

Advanced settings

Relationship mapping for upsert

Click the [+] button to add lines as needed and specify the external ID fields in the input flow, the lookup relationship fields in the upsert module, the lookup module as well as the external ID fields in the lookup module.

  • Column name of Talend Schema: the name of the external ID field in the input flow. It refers to the fields in the schema of the preceding component. Such columns are intended to match against the external ID fields specified in the External id name column, which are the fields of the lookup module specified in the Module name column.

  • Lookup relationship field name: the name of the lookup relationship field. It refers to the lookup relationship fields of the module specified in the Module Name field in the Basic settings view of the tSalesforceBulkExec component and is intended to establish relationship with the lookup module specified in the Module name column in this table. For how to define the lookup relationship fields and how to provide their correct names in the Lookup relationship field name field, go to the Salesforce website and launch the Salesforce Data Loader application for proper actions and information.

  • Module name: the name of the lookup module.

  • Polymorphic: select this check box when and only when polymorphic fields are used for relationship mapping. You will get an error if you keep this check box cleared for a polymorphic field or select it for a field that is not polymorphic. For more information about the polymorphic fields, search polymorphic at http://www.salesforce.com/us/developer/docs/api_asynch/.

  • External id name: the name of the external ID field in the lookup module specified in the Module name column.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is more commonly used with the tSalesforceBulkExec component. Used together, they gain performance while feeding or modifying information in Salesforce.com.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see the article Installing External Modules on Talend Help Center (https://help.talend.com) how to configure the Studio in the Talend Installation and Upgrade Guide.

Scenario: Inserting transformed bulk data into Salesforce

This scenario describes a six-component Job that transforms the data in the file SalesforceAccount.txt used in Scenario 2: Gathering erroneous data while inserting data into a Salesforce object, stores the transformed data in a CSV file suitable for bulk processing, and then loads the transformed data into Salesforce from the CSV file and displays the Job execution results on the console.

Setting up the Job

  1. Create a new Job and add a tFileInputDelimited component, a tMap component, a tSalesforceOutputBulk component, a tSalesforceBulkExec component and two tLogRow components by typing their names on the design workspace or dropping them from the Palette.

  2. Link the tFileInputDelimited component to the tMap component using a Row > Main connection.

  3. Link the tMap component to the tSalesforceOutputBulk component using a Row > *New Output* (Main) connection. In the pop-up dialog box, enter the name of the output connection. In this example, it is out.

  4. Link the tSalesforceBulkExec component to the first tLogRow component using a Row > Main connection.

  5. Link the tSalesforceBulkExec component to the second tLogRow component using a Row > Reject connection.

  6. Link the tFileInputDelimited component to the tSalesforceBulkExec component using a Trigger > OnSubjobOk connection.

Configuring the components

Preparing the bulk loading file

  1. Double-click the tFileInputDelimited component to open its Basic settings view.

  2. In the File name/Stream field, browse to or enter the path to the input data file. In this example, it is D:/SalesforceAccount.txt.

    In the Header field, type in 1 to skip the header row in the beginning of the file.

  3. Click the [...] button next to Edit schema and in the pop-up schema dialog box, define the schema by adding four columns Name, ParentId, Phone and Fax of String type.

    Click OK to save the changes and close the dialog box.

  4. Double-click the tMap component to open its map editor and set the transformation.

  5. Select all columns from the row1 input table and drop them to the out output table.

    Append .toUpperCase() in the Expression cell for the Name column in the out output table.

    Click OK to validate the transformation and close the map editor.

  6. Double-click the tSalesforceOutputBulk component to open its Basic settings view.

  7. In the Bulk File Path field, browse to or enter the path to the CSV file that will store the transformed data for bulk processing.

Loading data into Salesforce from the file

  1. Double-click the tSalesforceBulkExec component to open its Basic settings view.

  2. In the User Id, Password and Security Key fields, enter the user authentication information required to access Salesforce.

  3. Click the [...] button next to the Module Name field and in the pop-up dialog box, select the object you want to access. In this example, it is Account.

  4. Click the [...] button next to Edit schema and in the pop-up dialog box, remove all columns except Name, ParentId, Phone and Fax.

    Click OK to save the changes and accept the propagation prompted by the pop-up dialog box.

  5. In the Bulk File Path field, browse to or enter the path to the CSV file that stores the transformed data for bulk processing.

  6. Double-click the first tLogRow component to open its Basic settings view.

  7. In the Mode area, select Table (print values in cells of a table) for better readability of the results.

  8. Do the same to configure the second tLogRow component.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to execute the Job.

    You can check the execution result on the Run console.