Access
Access components
tAccessBulkExec | Offers gains in performance when carrying out Insert operations in an Access database. |
tAccessClose | Closes an active connection to the Access database so as to release occupied resources. |
tAccessCommit | Commits in one go a global transaction instead of doing that on every row or every batch, and provides gain in performance, using a unique connection. |
tAccessConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tAccessInput | Reads a database and extracts fields based on a query. |
tAccessOutput | Writes, updates, makes changes or suppresses entries in a database. |
tAccessOutputBulk | Prepares the file which contains the data used to feed the Access database. |
tAccessOutputBulkExec | Executes an Insert action on the data provided, in an Access database. |
tAccessRollback | Cancels the transaction commit in the connected database and avoids to commit part of a transaction involuntarily. |
tAccessRow | Executes the SQL query stated onto the specified database. |
Access scenario
Amazon Aurora
Amazon Aurora components
tAmazonAuroraInvalidRows | Checks Amazon Aurora database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). Only MySQL is supported. |
tAmazonAuroraValidRows | Checks Amazon Aurora database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). Only MySQL is supported. |
tAmazonAuroraClose | Closes an active connection to an Amazon Aurora database instance to release the occupied resources. |
tAmazonAuroraCommit | Commits in one go a global transaction instead of doing that on every row or every batch, and provides gain in performance, using a unique connection. |
tAmazonAuroraConnection | Opens a connection to an Amazon Aurora database instance that can then be reused by other Amazon Aurora components. |
tAmazonAuroraInput | Reads an Amazon Aurora database and extracts fields based on a query. |
tAmazonAuroraOutput | Writes, updates, makes changes or suppresses entries in an Amazon Aurora database. |
tAmazonAuroraRollback | Rolls back any changes made in the Amazon Aurora database to prevent partial transaction commit if an error occurs. |
tAmazonAuroraRow | Executes query statements on a specified Amazon Aurora database table. |
Amazon Aurora scenario
Amazon DynamoDB
Amazon DynamoDB components
tDynamoDBConfiguration | Stores connection information and credentials to be reused by other DynamoDB components. |
tDynamoDBLookupInput | Executes a database query with a strictly defined order which must correspond to the schema definition. |
tDynamoDBInput | Retrieves data from an Amazon DynamoDB table and sends them to the component that follows for transformation. |
tDynamoDBOutput | Creates, updates or deletes data in an Amazon DynamoDB table. |
Amazon DynamoDB scenario
Amazon EMR
Amazon EMR components
tAmazonEMRListInstances | Lists the details about the instance groups in a cluster on Amazon EMR (Elastic MapReduce). |
tAmazonEMRManage | Launches or terminates a cluster on Amazon EMR (Elastic MapReduce). |
tAmazonEMRResize | Adds or resizes a task instance group in a cluster on Amazon EMR (Elastic MapReduce). |
Amazon EMR scenario
Amazon EMR distribution
Amazon EMR distribution scenario
Amazon MySQL
Amazon MySQL components
tAmazonMysqlClose | Closes the transaction committed in the connected DB. |
tAmazonMysqlCommit | Commits in one go a global transaction instead of doing that on every row or every batch, and provides gain in performance, using a unique connection. |
tAmazonMysqlConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tAmazonMysqlInput | Reads a database and extracts fields based on a query. |
tAmazonMysqlOutput | Writes, updates, makes changes or suppresses entries in a database. |
tAmazonMysqlRollback | Cancels the transaction commit in the connected database and avoids to commit part of a transaction involuntarily. |
tAmazonMysqlRow | Executes the SQL query stated onto the specified database. |
Amazon Oracle
Amazon Oracle components
tAmazonOracleClose | Closes the transaction committed in the connected database. |
tAmazonOracleCommit | Commits in one go a global transaction instead of doing that on every row or every batch, and provides gain in performance, using a unique connection. |
tAmazonOracleConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tAmazonOracleInput | Reads a database and extracts fields based on a query. |
tAmazonOracleOutput | Writes, updates, makes changes or suppresses entries in a database. |
tAmazonOracleRollback | Cancels the transaction commit in the connected database and avoids to commit part of a transaction involuntarily. |
tAmazonOracleRow | Executes the SQL query stated onto the specified database. |
Amazon Redshift
Amazon Redshift components
tRedshiftConfiguration | Reuses the connection configuration to a Redshift database in the same Job. |
tRedshiftLookupInput | Reads a Redshift database and extracts fields based on a query. |
tAmazonRedshiftManage | Manages Amazon Redshift clusters and snapshots. |
tRedshiftBulkExec | Loads data into Amazon Redshift from Amazon S3, Amazon EMR cluster, Amazon DynamoDB, or remote hosts. |
tRedshiftClose | Closes the transaction committed in the connected DB. |
tRedshiftCommit | Provides gain in performance. |
tRedshiftConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tRedshiftInput | Reads data from a database and extracts fields based on a query so that you may apply changes to the extracted data. |
tRedshiftOutput | Writes, updates, modifies or deletes the data in a database. |
tRedshiftOutputBulk | Prepares a delimited/CSV file that can be used by tRedshiftBulkExec to feed Amazon Redshift. |
tRedshiftOutputBulkExec | Executes the Insert action on the data provided. |
tRedshiftRollback | Cancels the transaction commit in the Redshift database to avoid committing part of a transaction involuntarily. |
tRedshiftRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. |
tRedshiftUnload | Unloads data on Amazon Redshift to files on Amazon S3. |
Amazon Redshift scenarios
Amazon S3
Amazon S3 components
tS3Configuration | Reuses the connection configuration to S3 in the same Job. The Spark cluster to be used reads this configuration to eventually connect to S3. |
tS3Input | Reads data from a given S3N system (S3 Native Filesystem). |
tS3Output | Writes data into a given S3 filesystem. |
tS3BucketCreate | Creates a bucket on Amazon S3. |
tS3BucketDelete | Deletes an empty bucket from Amazon S3. |
tS3BucketExist | Verifies if the specified bucket exists on Amazon S3. |
tS3BucketList | Lists all the buckets on Amazon S3. |
tS3Close | Shuts down a connection to Amazon S3, thus releasing the network resources. |
tS3Connection | Establishes a connection to Amazon S3 to store and retrieve data. |
tS3Copy | Copies an Amazon S3 object from a source bucket to a destination bucket. |
tS3Delete | Deletes a file from Amazon S3. |
tS3Get | Retrieves a file from Amazon S3. |
tS3List | Lists the files on Amazon S3 based on the bucket/file prefix settings. |
tS3Put | Uploads data onto Amazon S3 from a local file or from cache memory via the streaming mode. |
Amazon S3 scenarios
- Writing and reading data from S3 (Databricks on AWS)
- Writing server-side KMS encrypted data on EMR
- Copying an S3 object from one bucket to another
- Exchange files with Amazon S3
- Listing files with the same prefix from a bucket
- Retrieving data from an S3 object in Studio
- Tagging S3 objects
- Verifing the absence of a bucket, creating it and listing all the S3 buckets
Amazon SQS
Amazon SQS components
tSQSConnection | Opens a connection to Amazon Simple Queue Service that can then be reused by other SQS components. |
tSQSInput | Retrieves one or more messages, with a maximum limit of ten messages, from an Amazon SQS (Simple Queue Service) queue. |
tSQSMessageChangeVisibility | Changes the visibility timeout of a specified message in an Amazon SQS (Simple Queue Service) queue. |
tSQSMessageDelete | Deletes a specified message from an Amazon SQS (Simple Queue Service) queue. |
tSQSOutput | Delivers one or more messages to an Amazon SQS (Simple Queue Service) queue. |
tSQSQueueAttributes | Gets attributes for a specified Amazon SQS (Simple Queue Service) queue. |
tSQSQueueCreate | Creates a new Amazon SQS (Simple Queue Service) queue. |
tSQSQueueDelete | Deletes an Amazon SQS (Simple Queue Service) queue. |
tSQSQueueList | Iterates and lists the URL of Amazon SQS (Simple Queue Service) queues in a specified region. |
tSQSQueuePurge | Purges messages in an Amazon SQS (Simple Queue Service) queue. |
Amazon SQS scenarios
Apache log
Apache log component
tApacheLogInput | Reads the access-log file for an Apache HTTP server. |
Apache log scenario
Archive/Unarchive
Archive/Unarchive components
tFileArchive | Creates a new zip, gzip, or tar.gz archive file from one or more files or folders. |
tFileUnarchive | Decompresses an archive file for further processing, in one of the following formats: *.tar.gz , *.tgz, *.tar, *.gz and *.zip. |
Archive/Unarchive scenarios
ARFF
ARFF components
tFileInputARFF | Reads an ARFF file row by row to split them up into fields and then sends the fields as defined in the schema to the next component. |
tFileOutputARFF | Writes an ARFF file that holds data organized according to the defined schema. |
ARFF scenario
AS400
AS400 components
tAS400Close | Closes the transaction committed in the connected database. |
tAS400Commit | Commits in one go a global transaction instead of doing that on every row or every batch, and provides gain in performance, using a unique connection. |
tAS400Connection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tAS400Input | Reads a database and extracts fields based on a query. |
tAS400LastInsertId | Obtains the primary key value of the record that was last inserted in an AS/400 table. |
tAS400Output | Writes, updates, makes changes or suppresses entries in a database. |
tAS400Rollback | Cancels the transaction commit in the connected database and avoids to commit part of a transaction involuntarily. |
tAS400Row | Executes the SQL query stated onto the specified database. |
AS400 scenario
Avro
Avro components
tAvroInput | Extracts records from any given Avro format files for other components to process the records. |
tAvroOutput | Receives data flows from the processing component placed ahead of it and writes the data into Avro format files in a given distributed file system. |
tAvroStreamInput | Listens on a given directory, reads data from Avro files once they are created and sends this data to the component that follows. |
Avro scenario
Azure Data Lake Store
Azure Data Lake Store components
tAzureAdlsGen2Input | Retrieves data from an ADLS Gen2 file system of an Azure storage account and passes the data to the subsequent component connected to it through a Main>Row link. |
tAzureAdlsGen2Output | Uploads incoming data to an ADLS Gen2 file system of an Azure storage account in the specified format. |
tAzureFSConfiguration | Provides authentication information for Spark to connect to a given Azure file system. |
Azure Data Lake Store scenarios
Azure Storage Blob
Azure Storage Blob components
tAzureFSConfiguration | Provides authentication information for Spark to connect to a given Azure file system. |
tAzureStorageConnection | Uses authentication and the protocol information to create a connection to the Microsoft Azure Storage system that can then be reused by other Azure Storage components. |
tAzureStorageContainerCreate | Creates a new storage container used to hold Azure blobs (Binary Large Object) for a given Azure storage account. |
tAzureStorageContainerDelete | Automates the removal of a given blob container from the space of a specific storage account. |
tAzureStorageContainerExist | Automates the verification of whether a given blob container exists or not within a storage account. |
tAzureStorageContainerList | Lists all containers in a given Azure storage account. |
tAzureStorageDelete | Deletes blobs from a given container for an Azure storage account according to the specified blob filters. |
tAzureStorageGet | Retrieves blobs from a given container for an Azure storage account according to the specified filters applied on the virtual hierarchy of the blobs and then write selected blobs in a local folder. |
tAzureStorageList | Lists blobs in a given container according to the specified blob filters. |
tAzureStoragePut | Uploads local files into a given container for an Azure storage account. |
Azure Storage Blob scenarios
Azure Storage Queue
Azure Storage Queue components
tAzureStorageConnection | Uses authentication and the protocol information to create a connection to the Microsoft Azure Storage system that can then be reused by other Azure Storage components. |
tAzureStorageQueueCreate | Creates a new queue under a given Azure storage account. |
tAzureStorageQueueDelete | Deletes a specified queue permanently under a given Azure storage account. |
tAzureStorageQueueInput | Retrieves one or more messages from the front of an Azure queue. |
tAzureStorageQueueInputLoop | Runs an endless loop to retrieve messages from the front of an Azure queue. |
tAzureStorageQueueList | Returns all queues associated with the given Azure storage account. |
tAzureStorageQueueOutput | Adds messages to the back of an Azure queue. |
tAzureStorageQueuePurge | Purges messages in an Azure queue. |
Azure Storage Table
Azure Storage Table components
tAzureStorageConnection | Uses authentication and the protocol information to create a connection to the Microsoft Azure Storage system that can then be reused by other Azure Storage components. |
tAzureStorageInputTable | Retrieves a set of entities that satisfy the specified filter criteria from an Azure storage table. |
tAzureStorageOutputTable | Performs the defined action on a given Azure storage table and inserts, replaces, merges or deletes entities in the table based on the incoming data from the preceding component. |
Azure Storage Table scenario
Azure Synapse Analytics
Azure Synapse Analytics components
tAzureSynapseBulkExec | Loads data into an Azure Synapse Analytics table from either Azure Blob Storage or Azure Data Lake Storage. |
tAzureSynapseClose | Closes an active connection to an Azure Synapse Analytics database. |
tAzureSynapseCommit | Commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tAzureSynapseConnection | Opens a connection to an Azure Synapse Analytics database. |
tAzureSynapseInput | Reads data and extracts fields based on a query from an Azure SQL Data Warehouse database. |
tAzureSynapseOutput | Writes, updates, makes changes or suppresses entries in an Azure SQL Data Warehouse database. |
tAzureSynapseRollback | Cancels the transaction commit in the connected Azure SQL Data Warehouse database to prevent partial transaction commit if an error occurs. |
tAzureSynapseRow | Executes an SQL query stated on an Azure Synapse Analytics database. |
Bonita
Bonita components
tBonitaDeploy | Deploys a specific Bonita process to a Bonita Runtime. |
tBonitaInstantiateProcess | Starts an instance for a specific process deployed in a Bonita Runtime engine. |
Bonita scenarios
Box
Box components
tBoxConnection | Creates a Box connection that the other Box components can reuse. |
tBoxCopy | Copies or moves a given folder or file from Box. |
tBoxDelete | Removes a given folder or file from Box. |
tBoxGet | Downloads a selected file from a Box account. |
tBoxList | Lists the files stored in a specified directory in Box. |
tBoxPut | Uploads files to a Box account. |
Box scenario
Buffer
Buffer components
tBufferInput | Retrieves data bufferized via a tBufferOutput component, for example, to process it in another subJob. |
tBufferOutput | Collects data in a buffer in order to access it later via webservice for example. |
Buffer scenarios
Business rules
Business rules components
Business rules scenarios
Cassandra
Cassandra components
tCassandraConfiguration | Enables the reuse of the connection configuration to a Cassandra server in the same Job. |
tCassandraLookupInput | Extracts the desired data from a standard or super column family of a Cassandra keyspace so as to apply changes to the data. |
tCassandraBulkExec | Improves performance during Insert operations to a Cassandra column family. |
tCassandraClose | Disconnects a connection to a Cassandra server so as to release occupied resources. |
tCassandraConnection | Enables the reuse of the connection it creates to a Cassandra server. |
tCassandraInput | Extracts the desired data from a standard or super column family of a Cassandra keyspace so as to apply changes to the data. |
tCassandraOutput | Writes data into or deletes data from a column family of a Cassandra keyspace. |
tCassandraOutputBulk | Prepares an SSTable of large size and processes it according to your needs before loading this SSTable into a column family of a Cassandra keyspace. |
tCassandraOutputBulkExec | Improves performance during Insert operations to a column family of a Cassandra keyspace. |
tCassandraRow | Acts on the actual DB structure or on the data, depending on the nature of the query and the database. |
Cassandra scenario
Change Data Capture
Change Data Capture components
tAS400CDC | Addresses data extraction and transportation needs. |
tDB2CDC | Extracts the changes done to the source operational data and makes them available to the target system(s) using database CDC views. |
tInformixCDC | Extracts the data from a source system which has changed since the last extraction and transports it to another/other system(s). |
tIngresCDC (deprecated) | Extracts source system data that has changed since the last extraction and transports it to another/other system(s). |
tMSSqlCDC | Extracts the changes made to the source operational data and makes them available to the target system(s) using database CDC views. |
tMysqlCDC | Extracts only the changes made to the source operational data and makes them available to the target system(s) using database CDC views. |
tOracleCDC | Extracts source system data that has changed since the last extraction and transports it to another/other system(s). |
tOracleCDCOutput | Synchronizes data changes in the Oracle XStream CDC mode. |
tPostgresqlCDC | Addresses data extraction and transportation needs, only extracts the changes made to the source operational data and makes them available to the target system(s) using database CDC views. |
tSybaseCDC | Extracts source system data that has changed since the last extraction and transports it to another/other system(s). |
tTeradataCDC | Extracts source system data that has changed since the last extraction and transports it to another system(s) using the CDC Trigger mode. |
Change Data Capture scenarios
Chart
Chart components
tBarChart | Generates a bar chart from the input data to ease technical analysis. |
tLineChart | Reads data from an input flow and transforms the data into a line chart in a PNG image file to ease technical analysis. |
Chart scenarios
Cloud
Cloud components
tCloudStart | Starts instances on Amazon EC2 (Amazon Elastic Compute Cloud). |
tCloudStop | Changes the status of a launched instance on Amazon EC2 (Amazon Elastic Compute Cloud). |
CombinedSQL
CombinedSQL components
tCombinedSQLAggregate | Provides a set of matrix based on values or calculations. |
tCombinedSQLFilter | Filters data by reorganizing, deleting or adding columns based on the source table and to filter the given data source using the filter conditions. |
tCombinedSQLInput | Extracts fields from a database table based on its schema definition. |
tCombinedSQLOutput | Inserts records from the incoming flow to an existing database table. |
CombinedSQL scenario
Context
Context components
tContextDump | Copies the context setup of the current Job to a flat file, a database table, etc., which can then be used by tContextLoad. |
tContextLoad | Loads a context from a flow. |
Context scenario
CosmosDB
CosmosDB components
tCosmosDBSQLAPIInput | Retrieves data from a Cosmos database collection through SQL API. |
tCosmosDBSQLAPIOutput | Inserts, updates, upserts or deletes documents in a Cosmos database collection based on the incoming flow from the preceding component through SQL API. |
tCosmosDBBulkLoad | Imports data files in different formats (CSV, TSV or JSON) into the specified Cosmos database so that the data can be further processed. |
tCosmosDBConnection | Creates a connection to a CosmosDB database and reuse that connection in other components. |
tCosmosDBInput | Retrieves certain documents from a Cosmos database collection by supplying a query document containing the fields the desired documents should match. |
tCosmosDBOutput | Inserts, updates, upserts or deletes documents in a Cosmos database collection based on the incoming flow from the preceding component in the Job. |
tCosmosDBRow | Executes the commands of the Cosmos database. |
Couchbase
Couchbase components
tCouchbaseDCPInput | Queries the documents from the Couchbase database, under the Database Change Protocol (DCP), a streaming protocol. |
tCouchbaseDCPOutput | Upserts documents in the Couchbase database based on the incoming flat data from preceding components, under the Database Change Protocol (DCP), a streaming protocol. |
tCouchbaseInput | Queries the documents from the Couchbase database. |
tCouchbaseOutput | Upserts documents in the Couchbase database based on the incoming flat data from preceding components. |
Couchbase scenario
CyberArk
CyberArk component
tCyberarkInput | Retrieves the content of an secret object (usually, a password) stored in a Cyberark vault at runtime. The retrieved content is stored in the after variable SECRET, which can be referenced by any subsequent components in the Job. The content can also be passed to the subsequent component in a column named secret through a Row > Main connection. |
CyberArk scenario
Data mapping
Data mapping components
tHConvertFile | Uses Talend Data Mapper structures to perform a conversion from one representation to another, as a Spark Batch execution. |
tHMap | Executes transformations (called maps) between different sources and destinations by harnessing the capabilities of Talend Data Mapper , available in the Mapping perspective. |
tHMapFile | Runs a Talend Data Mapper map where input and output structures may differ, as a Spark batch execution. |
tHMapInput | Runs a Talend Data Mapper map where input and output structures may differ, as a Spark batch execution, and sends the data for use by a downstream component. |
tHMapRecord | Runs a Talend Data Mapper map where input and output structures may differ, as a Spark streaming execution. |
Data mapping scenarios
- Connecting tHMapRecord to multiple outputs
- Generating the Output Using tHMap with Multiple Schema Inputs
- Generating the Output using tHMap with Multiple Payload Inputs
- Handling errors
- Transforming data in a Spark environment
- Transforming from a Data Integration schema to a complex content schema
- Using Talend Data Integration metadata
- Using Talend Data Mapper metadata
Data Preparation
Data Preparation components
tDataprepRun | Applies a preparation made using Talend Data Preparation in a standard Data Integration Job. |
tDatasetInput | Creates a flow with data from a Talend Data Preparation dataset. |
tDatasetOutput | Creates a dataset in Talend Data Preparation. |
Data Preparation scenarios
- Applying a preparation to a data sample in an Apache Spark Batch Job
- Applying a preparation to a data sample in an Apache Spark Streaming Job
- Creating a dataset from a Job
- Dynamically selecting a preparation at runtime according to the input
- Preparing data from a database in a Talend Job
- Promoting a Job leveraging a preparation across environments
Data Quality
Address standardization
Address standardization components
tAddressRowCloud | Verifies and formats international addresses in the Cloud by using online services. |
tBatchAddressRowCloud | Uses batch processing to parse address data and get formatted addresses quickly, accurately and without installing any software. |
Address standardization scenarios
Continuous matching
Continuous matching components
tMatchIndex | Indexes a clean and deduplicated data set in ElasticSearch for continuous matching purposes. |
tMatchIndexPredict | Compares a new data set with a lookup data set stored in ElasticSearch, using tMatchIndex. tMatchIndexPredict outputs unique records and suspect duplicates in separate files. |
Continuous matching scenarios
Data extraction
Data extraction components
tExtractRegexFields | Extracts data and generates multiple columns from a formatted string using regex matching. |
tPatternExtract | Outputs all data that match a given pattern. You can then implement any required operation on the extracted data. |
Data extraction scenarios
Data matching
Data matching components
tMatchGroup | Creates groups of similar data records in any source data including large volumes of data by using one or several match rules. |
tRecordMatching | Ensures the data quality of any source data against a reference data source. |
Data matching scenarios
- Grouping output data in separate flows according to the minimal distance computed in each record
- Matching customer data through multiple passes
- Matching data through multiple passes using Map/Reduce components
- Matching entries using the Q-grams and Levenshtein algorithms
- Using a custom matching algorithm to match entries
- Using survivorship functions to merge two records and create a master record
Data privacy
Data privacy components
tDataDecrypt | Decrypts data encrypted with the tDataEncrypt component. |
tDataEncrypt | Protects data by transforming it into unreadable cipher text. |
tDataMasking | Hides original data with random characters or figures to protect the actual data while having a functional substitute for occasions when it is not advisable to show sensitive real data. |
tDataShuffling | Shuffles the data from in an input table to protect the actual data while having a functional data set. Data will remain usable for purposes such as testing and training. |
tDataUnmasking | Unmasks data masked with the tDataMasking component to retrieve the original data. |
tDuplicateRow | Creates duplicates with meaningful data for data quality functional testing purposes. |
tPatternMasking | Masks data that follows a specific pattern and can transform the original data in consistent manner, if needed. |
tPatternUnmasking | Unmasks data masked with the tPatternMasking component to retrieve the original data. |
Data privacy scenarios
- Altering data values to restrict the use of actual sensitive data
- Encrypting and decrypting back sensitive data
- Generating duplicate data from an input flow
- Masking Australian phone numbers
- Masking Medicare beneficiary identifiers
- Shuffling data values to restrict the use of actual sensitive data
- Unmasking Australian phone numbers
Deduplication
Deduplication components
tRuleSurvivorship | Creates the single representation of an entity according to business rules and can create a master copy of data for Master Data Management. |
tSurviveFields | Centralizes data from various and heterogeneous sources to create a master copy of data for MDM. |
tUniqRow | Ensures data quality of input or output flow in a Job. |
Deduplication scenarios
- Converting the Standard Job to a Spark Batch Job
- Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing
- Deduplicating entries based on dynamic schema
- Deduplicating entries using Map/Reduce components
- Merging the content of several rows using different columns as rank values
- Modifying the rule file manually to code the conditions you want to use to create a survivor
- Selecting the best-of-breed data from a group of duplicates to create a survivor
- Deduplicating entries
Email validation
Email validation component
tVerifyEmail | Verifies if email addresses comply with specific rules and corrects addresses that do not match the rules by using the content from specific columns. |
Email validation scenario
Formatting
Formatting component
tChangeFileEncoding | Transforms the character encoding of a given file and generates a new file with the transformed character encoding. |
Formatting scenario
Fuzzy matching
Fuzzy matching components
tBlockedFuzzyJoin | Helps ensuring the data quality of any source data against a reference data source. |
tFuzzyJoin | Joins two tables by doing a fuzzy match on several columns, comparing columns from the main flow with reference columns from the lookup flow and outputting the main flow data and the rejected data. |
tFuzzyMatch | Compares a column from the main flow with a reference column from the lookup flow and outputs the main flow data displaying the distance. |
tFuzzyUniqRow | Compares columns in the input flow by using a defined matching method and collects the encountered duplicates. |
Fuzzy matching scenarios
- Checking the Levenshtein distance of 0 in first names
- Checking the Levenshtein distance of 1 or 2 in first names
- Checking the Metaphonic distance in first name
- Comparing four columns using different matching methods and collecting encountered duplicates
- Doing a fuzzy match on two columns and outputting the main and rejected data
- Doing a fuzzy match on two columns and outputting the match, possible match and non match values
Google address standardization
Google address standardization components
tGoogleAddressRow | Converts human-readable addresses into geographic coordinates and other geographic information. |
tGoogleGeocoder | Converts human-readable addresses into geographic coordinates. |
tGoogleMapLookup | Obtains detailed geographic information using geographic coordinates and address information. |
Google address standardization scenarios
Identification
Identification components
tGenKey | Generates a functional key from the input columns, by applying different types of algorithms on each column and grouping the computed results in one key, then outputs this key with the input columns. |
tAddCRCRow | Provides a unique ID which helps improving the quality of processed data. CRC stands for Cyclical Redundancy Checking. |
Identification scenarios
Loqate address standardization
Loqate address standardization component
tLoqateAddressRow | Parses, verifies, cleanses, standardizes, transliterates, and formats international addresses. |
Loqate address standardization scenario
Matching with machine learning
Matching with machine learning components
tMatchModel | Generates the matching model that is used by the tMatchPredict component to automatically predict the labels for the suspect pairs and groups records which match the label(s) set in the component properties. |
tMatchPairing | Enables you to compute pairs of suspect duplicates from any source data including large volumes in the context of machine learning on Spark. |
tMatchPredict | Labels suspect records automatically and groups suspect records which match the label(s) set in the component properties. |
Matching with machine learning scenarios
Melissa Data address standardization
Melissa Data address standardization components
tMelissaDataAddress | Verifies if an address is properly formatted and corrects any formatting or spelling errors in each row. |
tPersonator | Ensures the quality of a US and Canadian contact database by checking, verifying, moving and appending contact data. |
Melissa Data address standardization scenarios
Microsoft SQL Server validation
Microsoft SQL Server validation components
tMSSqlInvalidRows | Extracts DB rows that match a given data quality business rule. You can then implement any required correction. |
tMSSqlValidRows | Extracts DB rows that match a given data quality business rule. |
MySQL validation
MySQL validation components
tMySQLInvalidRows | Checks MySQL database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tMySQLValidRows | Checks MySQL database rows against Data Quality patterns (regular expression). |
MySQL validation scenarios
Name standardization
Name standardization component
tFirstnameMatch | Matches first names against a reference index in order to standardize data. |
Name standardization scenario
Oracle validation
Oracle validation components
tOracleInvalidRows | Checks Oracle database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tOracleValidRows | Checks Oracle database rows against Data Quality patterns (regular expression). |
Pattern validation
Pattern validation components
tFindRegexlibExpressions | Returns a dataset holding information about all of the regular expressions that match the request sent to the web server. |
tLastRegexlibExpressions | Returns a dataset holding information about the N most recent regular expressions added to the library and that match the query at http://regexlib.com. |
tMultiPatternCheck | Checks all existing data in multiple columns against a given Java regular expression. |
tPatternCheck | Gives two output flows: Matching Data and Non-Matching Data. The first collects all data that match a given pattern, and the second collects all data that do not match a given pattern. You can then implement any required corrections. |
Pattern validation scenarios
Phone number standardization
Phone number standardization component
tStandardizePhoneNumber | Standardizes phone numbers according to given formats. |
Phone number standardization scenario
PostgreSQL validation
PostgreSQL validation components
tPostgresqlInvalidRows | Extracts DB rows that do not match a given data quality pattern, you can then implement any required correction. |
tPostgresqlValidRows | Extracts DB rows that match a given data quality pattern. |
QAS address standardization
QAS address standardization components
tQASAddressIncomplete (deprecated) | Gives two output flows: Incomplete and Reject. |
tQASAddressRow | Corrects any formatting or spelling errors and gives the verification status for each row. |
tQASAddressUnknown (deprecated) | Gives one output flow: Unknown which collects all addresses that do not match to deliverable results in the QuickAddress data. |
tQASAddressVerified (deprecated) | Gives three output flows: Verified, Interaction required, and Reject. |
tQASBatchAddressRow | Corrects any formatting or spelling errors, adds missing data and gives the verification status for each row. |
QAS address standardization scenarios
Reporting
Reporting components
tDqReportRun | Launches the analyses listed in a report and save the results in the data quality data mart. |
tThresholdViolationAlert | Alerts to any threshold violations regarding the thresholds set on indicators in different quality analyses created in the Studio. |
Reporting scenarios
Sampling
Sampling component
tReservoirSampling | Extracts a random sample data from a big data set. |
Sampling scenario
Standardization
Standardization components
tStandardizeRow | Normalizes the incoming data in a separate XML or JSON data flow to separate or standardize the rule-compliant data from the non-compliant data. |
tIntervalMatch | Returns a value based on a Join relation. |
tReplaceList | Cleanses all files before further processing. |
Standardization scenarios
- Extracting exact match by using Index rules
- Normalizing data using rules of basic types
- Standardizing addresses from unstructured data
- Using two parsing levels to extract information from unstructured data
- Identifying server locations based on their IP addresses
- Replacing state names with their two-letter codes
Synonym index
Synonym index components
tSynonymOutput | Creates a Lucene index and feeds it with entries and the related synonyms it receives. |
tSynonymSearch | Searches a given index for the reference entries matching the data you input. |
Synonym index scenarios
Text standardization
Text standardization components
tJapaneseNumberNormalize | Normalizes Japanese numbers (kansūji) to regular Arabic numbers. |
tJapaneseTokenize | Splits Japanese text into tokens. |
tJapaneseTransliterate | Converts textual data in Japanese to kana and Latin scripts. |
tStem | Enables to standardize data in columns before matching this data. |
tTransliterate | Converts strings from many languages of the world to a standard set of characters (Universal Coded Character Set, UCS). |
Text standardization scenarios
Uniserv
Uniserv components
tUniservBTGeneric (deprecated) | Executes a process created with the Uniserv product DQ Batch Suite. |
tUniservRTConvertName (deprecated) | Analyzes the name elements in an address . |
tUniservRTMailBulk (deprecated) | Creates the index pool for duplicate search. |
tUniservRTMailOutput (deprecated) | Synchronizes the index pool that is used for duplicate search. |
tUniservRTMailSearch (deprecated) | Searches for duplicate values based on a given input record and adds additional data to each record. |
tUniservRTPost (deprecated) | Improves the addresses quality, which is extremely important for CRM and e-business as it is directly related to postage and advertising costs. |
Uniserv scenarios
- Adding contacts to the mailRetrieval index pool
- Analyzing a person's name and assigning a salutation
- Checking and correcting the postal code, city and street
- Checking and correcting the postal code, city and street, as well as rejecting the unfeasible
- Creating an index pool
- Execution of a Job in the Data Quality Service Hub Studio
Validation (Integration)
Validation (Integration) component
tSchemaComplianceCheck | Ensures the data quality of any source data against a reference data source. |
Validation (Integration) scenario
Data Stewardship
Data Stewardship components
tDataStewardshipTaskDelete | Connects to Talend Data Stewardship and deletes the data stored in campaigns in the form of tasks. |
tDataStewardshipTaskInput | Connects to Talend Data Stewardship and retrieves the data stored in campaigns in the form of tasks. |
tDataStewardshipTaskOutput | Connects to Talend Data Stewardship and loads data into campaigns in the form of tasks. The tasks must have the same schema defined in the campaign. |
Data Stewardship scenarios
- Assigning tasks dynamically in Talend Data Stewardship
- Deleting tasks from Talend Data Stewardship
- Populating campaigns dynamically using campaign IDs
- Populating tasks into the same campaign on different Talend Data Stewardship instances
- Retrieving tasks from Talend Data Stewardship
- Writing tasks in a Merging campaign
- Writing tasks in Talend Data Stewardship campaigns
Database utility
Database utility component
tCreateTable | Creates a table for a specific type of database. |
Database utility scenario
Databricks
Databricks components
tDBFSConnection | Connects to a given DBFS (Databricks Filesystem) system so that the other DBFS components can reuse the connection it creates to communicate with this DBFS. |
tDBFSGet | Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a user-defined directory and if needs be, renames them. |
tDBFSPut | Connects to a given DBFS (Databricks Filesystem) system, copies files from an user-defined directory, pastes them in this system and if needs be, renames these files. |
Databricks scenarios
DB Generic
DB Generic components
tDBCDC | Extracts only the changes made to the source operational data and makes them available to the target system(s) using database CDC views. |
tDBCDCOutput | Synchronizes data changes in database of the selected database type in the CDC mode. |
tDBInvalidRows | Checks database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tDBValidRows | Checks database rows against Data Quality patterns (regular expression). |
tDBBulkExec | Offers gains in performance while executing the Insert operations on a database. |
tDBClose | Closes the transaction committed in a connected database. |
tDBColumnList | Iterates on all columns of a given database table and lists column names. |
tDBCommit | Validates the data processed through the Job into the connected database. |
tDBConnection | Opens a connection to a database to be reused in the subsequent subJob or subJobs. |
tDBInput | Extracts data from a database. |
tDBLastInsertId | Obtains the primary key value of the record that was last inserted in a database table by a user. |
tDBOutput | Writes, updates, makes changes or suppresses entries in a database. |
tDBOutputBulk | Writes a file with columns based on the defined delimiter and the standards of the selected database type. |
tDBOutputBulkExec | Executes the Insert action in a database. |
tDBRollback | Cancels the transaction commit in a connected database to avoid committing part of a transaction involuntarily. |
tDBRow | Executes the stated SQL query onto a database. |
tDBSCD | Reflects and tracks changes in a dedicated database SCD table. |
tDBSCDELT | Reflects and tracks changes in a dedicated SCD table through SQL queries. |
tDBSP | Calls a database stored procedure. |
tDBTableList | Lists the names of specified database tables using a SELECT statement based on a WHERE clause. |
tParseRecordSet | Parses a recordset rather than individual records from a table. |
DB2
DB2 components
tDB2BulkExec | Executes the Insert action on the provided data and gains in performance during Insert operations to a DB2 database. |
tDB2Close | Closes a transaction committed in the connected DB. |
tDB2Commit | Commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tDB2Connection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tDB2Input | Executes a DB query with a strictly defined order which must correspond to the schema definition. Then tDB2Input passes on the field list to the next component via a Row > Main link. |
tDB2Output | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tDB2Rollback | Avoids to commit part of a transaction involuntarily. |
tDB2Row | Acts on the actual DB structure or on the data (although without handling data) depending on the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL statements. |
tDB2SP | Offers a convenient way to call the database stored procedures. |
DBFS
DBFS components
tDBFSConnection | Connects to a given DBFS (Databricks Filesystem) system so that the other DBFS components can reuse the connection it creates to communicate with this DBFS. |
tDBFSGet | Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a user-defined directory and if needs be, renames them. |
tDBFSPut | Connects to a given DBFS (Databricks Filesystem) system, copies files from an user-defined directory, pastes them in this system and if needs be, renames these files. |
Defining Context Groups
Defining Context Groups scenarios
Delimited
Delimited components
tFileStreamInputDelimited | Reads data continuously, row by row, to split it into fields, then sends fields defined in its schema to the next Job component, via a Row > Main link. |
tFileInputDelimited | Reads a delimited file row by row to split them up into fields and then sends the fields as defined in the schema to the next component. |
tFileOutputDelimited | Outputs the input data to a delimited file according to the defined schema. |
tPivotToColumnsDelimited | Fine-tunes the selection of data to output. |
Delimited scenarios
Delta Lake
Delta Lake components
tDeltaLakeClose | Closes an active DeltaLake connection to release the occupied resources. |
tDeltaLakeConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tDeltaLakeInput | Extracts the latest version or a given snapshot of records from the Delta Lake layer of your Data Lake system and sends the data to the next component for further processing. |
tDeltaLakeOutput | Writes records in the Delta Lake layer of your Data Lake system in the Parquet format. |
tDeltaLakeRow | Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
Delta Lake scenario
DotNET
DotNET components
tDotNETInstantiate | Invokes the constructor of a .NET object that is intended for later reuse. |
tDotNETRow | Facilitates data transform by utilizing custom or built-in .NET classes. |
DotNET scenarios
Dropbox
Dropbox components
tDropboxConnection | Creates a Dropbox connection to a given account that the other Dropbox components can reuse. |
tDropboxDelete | Removes a given folder or file from Dropbox. |
tDropboxGet | Downloads a selected file from a Dropbox account to a specified local directory. |
tDropboxList | Lists the files stored in a specified directory on Dropbox. |
tDropboxPut | Uploads data to Dropbox from either a local file or a given data flow. |
Dropbox scenario
Dynamic Schema
Dynamic Schema component
tSetDynamicSchema | Sets a dynamic schema that can be reused by components in the subsequent subJob or subJobs to retrieve data from unknown columns. |
Dynamic Schema scenarios
ElasticSearch
ElasticSearch components
tElasticSearchConfiguration | Enables the reuse of the connection configuration to ElasticSearch in the same Job. |
tElasticSearchInput | Reads documents from a given Elasticsearch system based on a user-defined query. |
tElasticSearchLookupInput | Executes a ElasticSearch query with a strictly defined order which must correspond to the schema definition. |
tElasticSearchOutput | Writes datasets into a given Elasticsearch system. |
ELT Greenplum
ELT Greenplum components
tELTGreenplumInput | Adds as many Input tables as required for the most complicated Insert statement. |
tELTGreenplumMap | Uses the tables provided as input to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. |
tELTGreenplumOutput | Executes the SQL Insert, Update and Delete statement to the Greenplum database |
ELT Greenplum scenarios
ELT Hive
ELT Hive components
tELTHiveInput | Replicates the schema, which the tELTHiveMap component that follows will use, of the input Hive table. |
tELTHiveMap | Builds graphically the Hive QL statement in order to transform data. |
tELTHiveOutput | Works alongside tELTHiveMap to write data into the Hive table. |
ELT Hive scenarios
ELT JDBC
ELT JDBC components
tELTInput | Adds as many Input tables as required for the SQL statement to be executed. |
tELTMap | Uses the tables provided as input to feed the parameter in the built SQL statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. |
tELTOutput | Carries out the action on the table specified and inserts the data according to the output schema defined in the ELT Mapper. |
ELT JDBC scenarios
ELT MSSql
ELT MSSql components
tELTMSSqlInput | Adds as many Input tables as required for the most complicated Insert statement. |
tELTMSSqlMap | Uses the tables provided as input to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. |
tELTMSSqlOutput | Executes the SQL Insert, Update and Delete statement to the MSSql database |
ELT MSSql scenarios
ELT MySQL
ELT MySQL components
tELTMysqlInput | Adds as many Input tables as required for the most complicated Insert statement. |
tELTMysqlMap | Uses the tables provided as input to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. |
tELTMysqlOutput | tELTMysqlOutput executes the SQL Insert, Update and Delete statement to the Mysql database |
ELT MySQL scenarios
ELT Netezza
ELT Netezza components
tELTNetezzaInput | Allows you to add as many Input tables as required for the most complicated Insert statement. |
tELTNetezzaMap | Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. |
tELTNetezzaOutput | Performs the action (insert, update or delete) on data in the specified Netezza table through the SQL statement generated by the tELTNetezzaMap component. |
ELT Netezza scenarios
ELT Oracle
ELT Oracle components
tELTOracleInput | Provides the Oracle table schema that will be used by the tELTOracleMap component to generate the SQL SELECT statement. |
tELTOracleMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTOracleInput components. |
tELTOracleOutput | Performs the action (insert, update, delete, or merge) on data in the specified Oracle table through the SQL statement generated by the tELTOracleMap component. |
ELT Oracle scenarios
- Aggregating Snowflake data using context variables as table and connection names
- Aggregating table columns and filtering
- Mapping data using a simple implicit join
- Mapping data using a subquery
- Mapping date using using an Alias table
- Updating Oracle database entries
- Managing data using the Oracle MERGE function
ELT PostgreSQL
ELT PostgreSQL components
tELTPostgresqlInput | Provides the Postgresql table schema that will be used by the tELTPostgresqlMap component to generate the SQL SELECT statement. |
tELTPostgresqlMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTPostgresqlInput components. |
tELTPostgresqlOutput | Performs the action (insert, update or delete) on data in the specified Postgresql table through the SQL statement generated by the tELTPostgresqlMap component. |
ELT PostgreSQL scenarios
ELT Sybase
ELT Sybase components
tELTSybaseInput | Provides the Sybase table schema that will be used by the tELTSybaseMap component to generate the SQL SELECT statement. |
tELTSybaseMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTSybaseInput components. |
tELTSybaseOutput | Performs the action (insert, update or delete) on data in the specified Sybase table through the SQL statement generated by the tELTSybaseMap component. |
ELT Sybase scenarios
ELT Teradata
ELT Teradata components
tELTTeradataInput | Provides the Teradata table schema that will be used by the tELTTeradataMap component to generate the SQL SELECT statement. |
tELTTeradataMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTTeradataInput components. |
tELTTeradataOutput | Performs the action (insert, update or delete) on data in the specified Teradata table through the SQL statement generated by the tELTTeradataMap component. |
ELT Teradata scenarios
ELT Vertica
ELT Vertica components
tELTVerticaInput | Provides the Vertica table schema that will be used by the tELTVerticaMap component to generate the SQL SELECT statement. |
tELTVerticaMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTVerticaInput components. |
tELTVerticaOutput | Performs the action (insert, update or delete) on data in the specified Vertica table through the SQL statement generated by the tELTVerticaMap component. |
ELT Vertica scenarios
ESB REST
ESB REST components
tRESTClient | Interacts with RESTful Web service providers by sending HTTP and HTTPS requests using CXF (JAX-RS) getting the corresponding responses. |
tRESTRequest | Receives GET/POST/PUT/PATCH/DELETE requests from the clients on the server end. |
tRESTResponse | Returns a specific HTTP status code to the client end as a response to the HTTP and/or HTTP requests. |
ESB REST scenarios
- Building a JSON document with tXMLMap to call a REST service
- Getting user information by interacting with a RESTful service
- Using a REST service to accept HTTP POST requests
- Using a REST service to accept HTTP POST requests and send responses
- Using a REST service to accept HTTP POST requests in an HTML form
- Updating user information by interacting with a RESTful service
- Using URI Query parameters to explore the data of a database
- Using a REST service to accept HTTP GET requests and send responses
- Using context variables in REST endpoint URLs in Data Services
ESB SOAP
ESB SOAP components
tESBConsumer | Calls the defined method from the invoked Web service and returns the class as defined, based on the given parameters. |
tESBProviderFault | Serves a Talend Job cycle result as a Fault message of the Web service in case of a request response communication style. |
tESBProviderRequest | Wraps Talend Job as web service. |
tESBProviderResponse | Serves a Talend Job cycle result as a response message. |
ESB SOAP scenarios
Exasol
Exasol components
tEXABulkExec | Imports data into an Exasol database table using the IMPORT command provided by the Exasol database in a fast way. |
tEXAClose | Closes an active connection to an Exasol database instance to release the occupied resources. |
tEXACommit | Validates the data processed through the Job into the connected Exasol database. |
tEXAConnection | Opens a connection to an Exasol database instance that can then be reused by other Exasol components. |
tEXAInput | Retrieves data from an Exasol database based on a query with a strictly defined order which corresponds to the schema definition, and passes the data to the next component. |
tEXAOutput | Writes, updates, modifies or deletes data in an Exasol database by executing the action defined on the table and/or on the data in the table, based on the flow incoming from the preceding component. |
tEXARollback | Cancels the transaction commit in the connected Exasol database. |
tEXARow | Executes SQL queries on an Exasol database. |
Exasol scenario
Excel
Excel components
tFileInputExcel | Reads an Excel file row by row to split them up into fields using regular expressions and then sends the fields as defined in the schema to the next component. |
tFileOutputExcel | Writes an MS Excel file with separated data values according to a defined schema. |
Excel scenario
EXist
EXist components
tEXistConnection (deprecated) | Opens a connection to an eXist database in order that a transaction may be carried out. |
tEXistDelete (deprecated) | Deletes specified resources from a remote eXist database. |
tEXistGet (deprecated) | Retrieves selected resources from a remote eXist database to a defined local directory. |
tEXistList (deprecated) | Lists the resources stored on a remote eXist database. |
tEXistPut (deprecated) | Uploads specified files from a defined local directory to a remote eXist database. |
tEXistXQuery (deprecated) | Queries XML files located on remote databases using local files containing XPath queries and outputs the results to an XML file stored locally. |
tEXistXUpdate (deprecated) | Processes XML file records and updates the existing records on the database server. |
EXist scenario
Firebird
Firebird components
tFirebirdClose | Closes a transaction with a Firebird database. |
tFirebirdCommit | Commits a global transaction instead of doing so on every row or every batch, thus providing a gain in performance. |
tFirebirdConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tFirebirdInput | Executes a database query on a Firebird database with a strictly defined order which must correspond to the schema definition then passes on the field list to the next component via a Main row link. |
tFirebirdOutput | Executes the action defined on the table in a Firebird database and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tFirebirdRollback | Cancels the transation committed in the connected Firebird database. |
tFirebirdRow | Executes the stated SQL query on the specified Firebird database. |
Flume
Flume components
tFlumeInput | Acts as interface to integrate Flume and the Spark Streaming Job developed with the Studio to continuously read data from a given Flume agent. |
tFlumeOutput | Acts as interface to integrate Flume and the Spark Streaming Job developed with the Studio to continuously send data to a given Flume agent. |
FTP
FTP components
tFTPClose | Closes an active FTP connection to release the occupied resources. |
tFTPConnection | Opens an FTP connection to transfer files in a single transaction. |
tFTPDelete | Deletes files or folders in a specified directory on an FTP server. |
tFTPFileExist | Checks if a file or a directory exists on an FTP server. |
tFTPFileList | Lists all files and folders directly under a specified directory based on a filemask pattern. |
tFTPFileProperties | Retrieves the properties of a specified file on an FTP server. |
tFTPGet | Downloads files to a local directory from an FTP directory. |
tFTPPut | Uploads files from a local directory to an FTP directory. |
tFTPRename | Renames files in an FTP directory. |
tFTPTruncate | Truncates files in an FTP directory. |
FTP scenarios
FullRow
FullRow components
tFileStreamInputFullRow | Reads data in a newly-created file row by row and sends the entire rows within one single field to the next Job component, via a Row > Main link. |
tFileInputFullRow | Reads a file row by row and sends complete rows of data as defined in the schema to the next component via a Row link. |
FullRow scenario
Global variable
Global variable components
tGlobalVarLoad | Sets variables using the incoming data so that the data can be dynamically reused by other subJobs. |
tSetGlobalVar | Facilitates the process of defining global variables. |
Global variable scenarios
Google BigQuery
Google BigQuery components
tBigQueryConfiguration | Provides the connection configuration to Google BigQuery and Google Cloud Storage for a Spark Job. |
tBigQueryBulkExec | Transfers given data to Google BigQuery. |
tBigQueryInput | Performs the queries supported by Google BigQuery. |
tBigQueryOutput | Transfers the data provided by its preceding component to Google BigQuery. |
tBigQueryOutputBulk | Creates a .txt or .csv file for the data of large size so that you can process it according to your needs before transferring it to Google BigQuery. |
tBigQuerySQLRow | Connects to Google BigQuery and performs queries to select data from tables row by row or create or delete tables in Google BigQuery. |
Google BigQuery scenarios
Google Dataproc
Google Dataproc component
tGoogleDataprocManage | Creates or deletes a Dataproc cluster in the Global region on Google Cloud Platform. |
Google Drive
Google Drive components
tGoogleDriveConnection | Opens a Google Drive connection that can be reused by other Google Drive components. |
tGoogleDriveCopy | Creates a copy of a file/folder in Google Drive. |
tGoogleDriveCreate | Creates a new folder in Google Drive. |
tGoogleDriveDelete | Deletes a file/folder in Google Drive. |
tGoogleDriveGet | Gets a file's content and downloads the file to a local directory. |
tGoogleDriveList | Lists all files, or folders, or both files and folders in a specified Google Drive folder, in the domain, including both Shared Drive and My Drive, and all shared drives. |
tGoogleDrivePut | Uploads data from a data flow or a local file to Google Drive. |
Google Drive scenario
Google PubSub
Google PubSub components
tPubSubInput | Connects to the Google Cloud PubSub service that transmits messages to the components that run transformations over these messages. |
tPubSubInputAvro | Connects to Google Cloud Pub/Sub to receive messages in the Avro format for the components that run transformations over these messages. |
tPubSubOutput | Receives messages serialized into byte arrays by its preceding component and issues these messages into a given PubSub service. |
GPG
GPG component
tGPGDecrypt | Calls the gpg
-d command to decrypt a GnuPG-encrypted file and saves
the decrypted file in the specified directory. |
GPG scenario
Greenplum
Greenplum components
tGreenplumBulkExec | Improves performance when loading data in a Greenplum database. |
tGreenplumClose | Closes a connection to the Greenplum database. |
tGreenplumCommit | Commits global transaction in one go instead of repeating the operation for every row or every batch and thus provides gain in performance. |
tGreenplumConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tGreenplumGPLoad | Bulk loads data into a Greenplum table either from an existing data file, an input flow, or directly from a data flow in streaming mode through a named-pipe. |
tGreenplumInput | Reads a database and extracts fields based on a query. |
tGreenplumOutput | Executes the action defined on the table and/or on the data of a table, according to the input flow from the previous component. |
tGreenplumOutputBulk | Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database. |
tGreenplumOutputBulkExec | Provides performance gains during Insert operations to a Greenplum database. |
tGreenplumRollback | Avoids to commit part of a transaction involuntarily. |
tGreenplumRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. |
Groovy
Groovy components
tGroovy | tGroovy broadens the functionality if the Job, using the Groovy language which is a simplified Java syntax. |
tGroovyFile | Broadens the functionality of Jobs using the Groovy language which is a simplified Java syntax. |
Groovy scenario
GS
GS components
tGoogleCloudConfiguration | Provides the connection configuration to Google Cloud Platform for a Spark Job. |
tGSConfiguration | Provides the connection configuration to Google Cloud Storage for a Spark Job. |
tGSBucketCreate | Creates a new bucket which you can use to organize data and control access to data in Google Cloud Storage. |
tGSBucketDelete | Deletes an empty bucket in Google Cloud Storage so as to release occupied resources. |
tGSBucketExist | Checks the existence of a bucket in Google Cloud Storage so as to make further operations. |
tGSBucketList | Retrieves a list of buckets from all projects or one specific project in Google Cloud Storage. |
tGSClose | Closes an active connection to Google Cloud Storage in order to release the occupied resources. |
tGSConnection | Provides the authentication information for making requests to the Google Cloud Storage system and enables the reuse of the connection it creates to Google Cloud Storage. |
tGSCopy | Copies or moves objects within a bucket or between buckets in Google Cloud Storage. |
tGSDelete | Deletes the objects which match the specified criteria in Google Cloud Storage so as to release the occupied resources. |
tGSGet | Retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a local directory. |
tGSList | Retrieves a list of objects from Google Cloud Storage one by one. |
tGSPut | Uploads files from a local directory to Google Cloud Storage so that you can manage them with Google Cloud Storage. |
GS scenario
HBase
HBase components
tHBaseConfiguration | Enables the reuse of the connection configuration to HBase in the same Job. |
tHBaseLookupInput | Provides lookup data to the main flow of a streaming Job. |
tHBaseClose | Closes an HBase connection you have established in your Job. |
tHBaseConnection | Establishes an HBase connection to be reused by other HBase components in your Job. |
tHBaseInput | Reads data from a given HBase database and extracts columns of selection. |
tHBaseOutput | Writes columns of data into a given HBase database. |
HBase scenario
HCatalog
HCatalog components
tHCatalogInput | Reads data from an HCatalog managed Hive database and send data to the component that follows. |
tHCatalogLoad | Reads data directly from HDFS and writes this data into an established HCatalog managed table. |
tHCatalogOperation | Prepares the HCatalog managed database/table/partition to be processed. |
tHCatalogOutput | Receives data from its incoming flow and writes this data into an HCatalog managed table. |
HCatalog scenario
HDFS
HDFS components
tHDFSConfiguration | Enables the reuse of the connection configuration to HDFS in the same Job. |
tHDFSCompare | Compares two files in HDFS and based on the read-only schema, generates a row flow that presents the comparison information. |
tHDFSConnection | Connects to a given HDFS so that the other Hadoop components can reuse the connection it creates to communicate with this HDFS. |
tHDFSCopy | Copies a source file or folder into a target directory in HDFS and removes this source if required. |
tHDFSDelete | Deletes a file located on a given Hadoop distributed file system (HDFS). |
tHDFSExist | Checks whether a file exists in a specific directory in HDFS. |
tHDFSGet | Copies files from Hadoop distributed file system(HDFS), pastes them in a user-defined directory and if needs be, renames them. |
tHDFSInput | Extracts the data in a HDFS file for other components to process it. |
tHDFSList | tHDFSList retrieves a list of files or folders based on a filemask pattern and iterates on each unity. |
tHDFSOutput | Writes data flows it receives into a given Hadoop distributed file system (HDFS). |
tHDFSOutputRaw | Transfers data of different formats such as hierarchical data in the form of a single column into a given HDFS file system. |
tHDFSProperties | Creates a single row flow that displays the properties of a file processed in HDFS. |
tHDFSPut | Connects to Hadoop distributed file system to load large-scale files into it with optimized performance. |
tHDFSRename | Renames the selected files or specified directory on HDFS. |
tHDFSRowCount | Reads a file in HDFS row by row in order to determine the number of rows this file contains. |
HDFS scenarios
Hive
Hive components
tHiveClose | Closes connection to a Hive database. |
tHiveConfiguration | Enables the reuse of the connection configuration to Hive in the same Job. |
tHiveConnection | Establishes a Hive connection to be reused by other Hive components in your Job. |
tHiveCreateTable | Creates Hive tables that fit a wide range of Hive data formats. |
tHiveInput | Extracts data from Hive and sends the data to the component that follows. |
tHiveLoad | Writes data of different formats into a given Hive table or to export data from a Hive table to a directory. |
tHiveOutput | Connects to a given Hive database and writes the data it receives into a given Hive table or a directory in HDFS. |
tHiveRow | Acts on the actual DB structure or on the data without handling data itself, depending on the nature of the query and the database. |
tHiveWarehouseConfiguration | Enables the reuse of the Hive Warehouse Connector connection configuration to Hive in the same Job. |
tHiveWarehouseInput | Extracts data from Hive and sends the data to the component that follows using Hive Warehouse Connector. |
tHiveWarehouseOutput | Connects to a given Hive database and writes the received data into a given Hive table or a directory in HDFS using Hive Warehouse Connector. |
Hive scenarios
HSQLDB
HSQLDB components
tHSQLDbInput | Executes a DB query with a strictly defined order which must correspond to the schema definition and then it passes on the field list to the next component via a Main row link. |
tHSQLDbOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tHSQLDbRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. |
HTTP
HTTP component
tHttpRequest | Sends an HTTP request to the server and outputs the response information locally. |
HTTP scenarios
Impala
Impala components
tImpalaClose | Closes connection to an Impala database. |
tImpalaConnection | Establishes an Impala connection to be reused by other Impala components in your Job. |
tImpalaCreateTable | Creates Impala tables that fit a wide range of Impala data formats. |
tImpalaInput | Executes the select queries to extract the corresponding data and sends the data to the component that follows. |
tImpalaLoad | Writes data of different formats into a given Impala table or to export data from an Impala table to a directory. |
tImpalaOutput | Executes the action defined on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tImpalaRow | Acts on the actual DB structure or on the data (although without handling data). |
Informix
Informix components
tInformixBulkExec | Executes Insert operations in Informix databases. |
tInformixClose | Closes connection to Informix databases. |
tInformixCommit | Makes a global commit just once instead of commiting every row or batch of rows separately. |
tInformixConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tInformixInput | Reads a database and extracts fields based on a query. |
tInformixOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tInformixOutputBulk | Prepares the file to be used as a parameter in the INSERT query used to feed Informix databases. |
tInformixOutputBulkExec | Carries out Insert operations in Informix databases using the data provided. |
tInformixRollback | Prevents involuntary transaction commits by canceling transactions in connected databases. |
tInformixRow | Acts on the actual DB structure or on the data (although without handling data) thanks to the SQLBuilder that helps you write easily your SQL statements. |
tInformixSP | Centralises and calls multiple and complex queries in a database. |
Ingres
Ingres components
tIngresBulkExec (deprecated) | Inserts data in bulk to a table in the Ingres DBMS for performance gain. |
tIngresClose (deprecated) | Closes the transaction committed in the connected Ingres database. |
tIngresCommit (deprecated) | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tIngresConnection (deprecated) | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tIngresInput (deprecated) | Reads an Ingres database and extracts fields based on a query. |
tIngresOutput (deprecated) | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tIngresOutputBulk (deprecated) | Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain. |
tIngresOutputBulkExec (deprecated) | Inserts data in bulk to a table in the Ingres DBMS for performance gain. |
tIngresRollback (deprecated) | Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the connected database. |
tIngresRow (deprecated) | Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
Ingres scenario
Interbase
Interbase components
tInterbaseClose (deprecated) | Closes the transaction committed in the connected Interbase database. |
tInterbaseCommit (deprecated) | Commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tInterbaseConnection (deprecated) | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tInterbaseInput (deprecated) | Reads an Interbase database and extracts fields based on a query. |
tInterbaseOutput (deprecated) | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tInterbaseRollback (deprecated) | Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the connected Interbase database. |
tInterbaseRow (deprecated) | Acts on the actual database structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
Internet (Integration)
Internet (Integration) component
tFileFetch | Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or SMB). |
Internet (Integration) scenarios
Jasper
Jasper components
tJasperOutput | Creates a report in rich formats using Jaspersoft's iReport. |
tJasperOutputExec | Creates a report in rich formats using Jaspersoft's iReport and offers a performance gain as it functions as a combination of an input component and a tJasperOutput component. |
Jasper scenario
Java custom code for Map Reduce
Java custom code for Map Reduce component
tJavaMR | Provides an editor that enables you to enter personalized MapReduce code in order to integrate it in Talend program. |
Java custom code for Map Reduce scenario
Java custom code for Storm
Java custom code for Storm component
tJavaStorm (deprecated) | Provides a Java code editor that lets you enter the custom Storm code you want to use in the Storm topology you are designing. |
Java custom code for Storm scenario
Java custom code
Java custom code components
tJava | Extends the functionalities of a Talend Job using custom Java commands. |
tJavaFlex | Provides a Java code editor that lets you enter personalized code in order to integrate it in Talend program. |
tJavaRow | Provides a code editor that lets you enter the Java code to be applied to each row of the flow. |
Java custom code scenarios
- Using tJavaFlex to display file content based on a dynamic schema
- Using tJavaRow to handle file content based on a dynamic schema
- Checking the format of an e-mail address
- Generating data flow
- Printing out a variable content
- Processing rows of data with tJavaFlex
- Redirecting the standard output to a file for the entire Job
- Transforming data line by line using tJavaRow
JavaDB
JavaDB components
tJavaDBInput | Reads a database and extracts fields based on a query |
tJavaDBOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tJavaDBRow | Acts on the actual database structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
JBoss ESB
JBoss ESB components
tJBossESBInput | Retrieves a message from a JBossESB server to process it as a flow that can be used in a Talend Job. |
tJBossESBOutput | Transforms the data used in a Talend Job into a JBossESB message. |
JDBC
JDBC components
tJDBCConfiguration | Stores connection information and credentials to be reused by other JDBC components. |
tJDBCLookupInput | Reads a database and extracts fields based on a query. |
tJDBCClose | Closes an active JDBC connection to release the occupied resources. |
tJDBCColumnList | Lists all column names of a given JDBC table. |
tJDBCCommit | Commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tJDBCConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tJDBCInput | Reads any database using a JDBC API connection and extracts fields based on a query. |
tJDBCOutput | Executes the action defined on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tJDBCRollback | Avoids commiting part of a transaction accidentally by canceling the transaction committed in the connected database. |
tJDBCRow | Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
tJDBCSP | Centralizes multiple or complex queries in a database in order to call them easily. |
tJDBCTableList | Lists the names of a given set of JDBC tables using a select statement based on a Where clause. |
JIRA
JIRA components
tJIRAInput | Retrieves the issue information based on a JQL query or retrieve the project information based on a specified project ID from JIRA. |
tJIRAOutput | Inserts, updates, or deletes the issue or project information in JIRA. |
JIRA scenarios
JMS
JMS components
tJMSInput | Creates an interface between a Java application and a Message-Oriented middleware system. |
tJMSOutput | Creates an interface between a Java application and a Message-Oriented middleware system. |
JMS scenario
JSON
JSON components
tFileStreamInputJSON | Extracts JSON data from a file, then transfers the data to, for instance, a file or a database table. |
tFileInputJSON | Extracts JSON data from a file and transfers the data to a file, a database table, etc. |
tFileOutputJSON | Receives data and rewrites it in a JSON structured data block in an output file. |
JSON scenarios
Kafka
Kafka components
tKafkaInputAvro | Transmits Avro-formatted messages you need to process to its following component in the Job you are designing. |
tKafkaCommit | Saves the current state of the tKafkaInput to which it is connected. |
tKafkaConnection | Opens a reusable Kafka connection. |
tKafkaCreateTopic | Creates a Kafka topic that the other Kafka components can use. |
tKafkaInput | Transmits messages you need to process to the components that follow in the Job you are designing. |
tKafkaOutput | Publishes messages into a Kafka system. |
Kafka scenarios
Kerberos
Kerberos component
tSetKerberosConfiguration | Sets the relevant information for Kerberos authentication. |
Keystore
Keystore component
tSetKeystore | Sets the authentication data type between PKCS 12 and JKS. |
Keystore scenario
Kinesis
Kinesis components
tKinesisInput | Acts as consumer of an Amazon Kinesis stream to pull messages from this Kinesis stream. |
tKinesisInputAvro | Acts as consumer of an Amazon Kinesis stream to pull messages from this Kinesis stream. |
tKinesisOutput | Acts as data producer to put data to an Amazon Kinesis stream for real-time ingestion. |
Kinesis scenario
Kudu
Kudu components
tKuduConfiguration | Enables the reuse of the connection configuration to Cloudera Kudu in the same Job. |
tKuduInput | Retrieves data from a Cloudera Kudu table and sends them to the component that follows for transformation. |
tKuduOutput | Creates, updates or deletes data in a Cloudera Kudu table. |
Kudu scenario
LDAP
LDAP components
tLDAPAttributesInput | Analyses each object found via the LDAP query and lists a collection of attributes associated with the object. |
tLDAPClose | Disconnects one connection to the LDAP Directory server so as to release occupied resources. |
tLDAPConnection | Creates a connection to an LDAP Directory server. |
tLDAPInput | Executes an LDAP query based on the given filter and corresponding to the schema definition. Then it passes on the field list to the next component via a Row > Main link. |
tLDAPOutput | Executes an LDAP query based on the given filter and corresponding to the schema definition. Then it passes on the field list to the next component via a Row > Main link. |
tLDAPRenameEntry | Renames ones or more entries in a specific LDAP directory. |
LDAP scenarios
LDIF
LDIF components
tFileInputLDIF | Reads an LDIF file row by row to split them up into fields and sends the fields as defined in the schema to the next component using a Row connection. |
tFileOutputLDIF | Writes or modifies an LDIF file with data separated in respective entries based on the schema defined, or else deletes content from an LDIF file. |
LDIF scenario
Library import
Library import component
tLibraryLoad | Loads useable Java libraries in a Job. |
Library import scenario
Logs and errors (Integration)
Logs and errors (Integration) components
tAssert | Generates the boolean evaluation on the concern for the Job execution status and provides the Job status messages to tAssertCatcher. |
tAssertCatcher | Generates a data flow consolidating the status information of a job execution and transfer the data into defined output files. |
tChronometerStart | Operates as a chronometer device that starts calculating the processing time of one or more subJobs in the main Job, or that starts calculating the processing time of part of your subJob. |
tChronometerStop | Operates as a chronometer device that stops calculating the processing time of one or more subJobs in the main Job, or that stops calculating the processing time of part of your subJob. tChronometerStop displays the total execution time. |
tDie | Triggers the tLogCatcher component for exhaustive log before killing the Job. |
tFlowMeter | Counts the number of rows processed in the defined flow, so this number can be caught by the tFlowMeterCatcher component for logging purposes. |
tFlowMeterCatcher | Operates as a log function triggered by the use of a tFlowMeter component in the Job. |
tLogCatcher | Operates as a log function triggered by one of the three: Java exception, tDie or tWarn, to collect and transfer log data. |
tLogRow | Displays data or results in the Run console to monitor data processed. |
tStatCatcher | Gathers the Job processing metadata at the Job level and at the component level and transfers the log data to the subsequent component for display or storage. |
tWarn | Triggers a warning often caught by the tLogCatcher component for exhaustive log. |
Logs and errors (Integration) scenarios
- Catching flow metrics from a Job
- Catching messages triggered by a tWarn component
- Catching the message triggered by a tDie component
- Displaying the statistics log of Job execution
- Measuring the processing time of a subJob and part of a subJob
- Setting up the assertive condition for a Job execution
- Viewing product orders status (on a daily basis) against a benchmark number
Machine Learning
Machine Learning components
tALSModel | Generates an user-ranking-product associated matrix, based on given user-product interactive data. |
tClassify | Predicts which class an element belongs to, based on the classifier model generated by a model training component. |
tClassifySVM | Predicts which class an element belongs to, based on the classifier model generated by tSVMModel. |
tDecisionTreeModel | Analyzes feature vectors usually prepared and provided by tModelEncoder to generate a classifier model that is used by tPredict to classify given elements. |
tGradientBoostedTreeModel | Analyzes feature vectors usually prepared and provided by tModelEncoder to generate a classifier model that is used by tPredict to classify given elements. |
tKMeansModel | Analyzes incoming datasets based on applying the K-Means algorithm. |
tKMeansStrModel | Analyzes incoming datasets in near real-time, based on applying the K-Means algorithm. |
tLinearRegressionModel | Builds a linear regression model using a training dataset. |
tLogisticRegressionModel | Analyzes feature vectors usually pre-processed by tModelEncoder to generate a classifier model that is used by tPredict to classify given elements. |
tMahoutClustering (deprecated) | Groups unlabeled numerical data into clusters that can reveal interesting patterns or helps identifying abnormal data items in the data set. |
tModelEncoder | Performs featurization operations to transform data into the format expected by the model training components such as tLogisticRegressionModel or tKMeansModel. |
tNaiveBayesModel | Generates a classifier model that is used by tPredict to classify given elements. |
tPredict | Predicts the situation of an element. |
tPredictCluster | Predicts the cluster of an element. |
tRandomForestModel | Analyzes feature vectors. |
tRecommend | Recommends products to users known to this model, based on the user-product recommender model generated by tASLModel. |
tSVMModel | Generates an SVM-based classifier model that can be used by tPredict to classify given elements. |
Machine Learning scenarios
Mail components
tFileInputMail | Reads the standard key data of a given MIME or MSG email file. |
tSendMail | Notifies recipients about a particular state of a Job or possible errors. |
Mail scenarios
MapRDB
MapRDB components
tMapRDBConfiguration | Stores connection information and credentials to be reused by other MapRDB components. |
tMapRDBLookupInput | Provides lookup data to the main flow of a streaming Job. |
tMapRDBClose | Closes an MapRDB connection you have established in a same Job. |
tMapRDBConnection | Establishes a MapRDB connection to be reused by other MapRDB components in a same Job. |
tMapRDBInput | Reads data from a given MapRDB database and extracts columns of selection. |
tMapRDBOutput | Writes columns of data into a given MapRDB database. |
tMapROjaiInput | Reads documents from a MapR-DB database to load the data in a given Job. |
tMapROjaiOutput | Inserts, replaces or deletes documents in a MapR-DB database to be used as document database, based on the incoming flow from the preceding component in the Job. |
MapRDB scenario
MapRStreams
MapRStreams components
tMapRStreamsInputAvro | Transmits messages in the Avro format to the Job that runs transformations over these messages. Only MapR V5.2 onwards is supported by this component. |
tMapRStreamsCommit | Connects to a given tMapRStreamsInput to perform a consumer offset commit. |
tMapRStreamsConnection | Opens a reusable connection to a given MapR Streams cluster so that the other MapR Streams components can reuse this connection. |
tMapRStreamsCreateStream | Creates a MapR Streams stream or topic that the other MapR Streams components can use. |
tMapRStreamsInput | Transmits messages to the Job that runs transformations over these messages. Only MapR V5.2 onwards is supported by this component. |
tMapRStreamsOutput | Publishes messages into a MapR Streams system. Only MapR V5.2 onwards is supported by this component. |
Marketo
Marketo components
tMarketoBulkExec | Imports leads or custom objects into Marketo from a local file in the REST API mode. |
tMarketoCampaign | Retrieves campaign records, activity and campaign changes related data from Marketo. |
tMarketoConnection | Opens a connection to Marketo that can then be reused by other Marketo components. |
tMarketoInput | Retrieves lead records, activity history, lead changes, and custom object related data from Marketo. |
tMarketoListOperation | Adds/removes one or more leads to/from a list in Marketo. Also, it helps you verify the existence of one or more leads in a list in Marketo. |
tMarketoOutput | Writes lead records or custom object records from the incoming data flow into Marketo. |
Marketo scenarios
MarkLogic
MarkLogic components
tMarkLogicBulkLoad | Imports local files into a MarkLogic server database in bulk mode using the MarkLogic Content Pump (MLCP) tool. |
tMarkLogicClose | Closes an active connection to a MarkLogic database to release the occupied resources. |
tMarkLogicConnection | Opens a connection to a MarkLogic database that can then be reused by other MarkLogic components. |
tMarkLogicInput | Searches document content in a MarkLogic database based on a string query. |
tMarkLogicOutput | Creates, updates or deletes document content in a MarkLogic database. |
MaxDB
MaxDB components
tMaxDBInput | Reads a database and extracts fields based on a query. |
tMaxDBOutput | Writes, updates, makes changes or suppresses entries in a database. |
tMaxDBRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. |
MDM (Master Data Management)
MDM connection and transaction
MDM connection and transaction components
tMDMClose | Terminates an open MDM server connection after the execution of the proceeding subJob. |
tMDMCommit | Commits all changes to the database made within the scope of a transaction in MDM. |
tMDMConnection | Opens an MDM server connection for convenient reuse in the current Job or transaction. |
tMDMRollback | Rolls back any changes made in the database rather than definitively committing them, for example to prevent partial commits if an error occurs. |
MDM data processing
MDM data processing components
tMDMBulkLoad | Uses bulk mode to write XML structured master data into the MDM server. |
tMDMDelete | Deletes master data records from specific entities in the MDM Hub. |
tMDMInput | Reads data in an MDM Hub and thus makes it possible to process this data. |
tMDMOutput | Writes data into or removes data from the MDM server. |
tMDMRestInput | Reads data through the REST API from the MDM Hub for further processing. |
tMDMSP | Offers a convenient way to centralize multiple or complex queries in an MDM Hub and calls the stored procedure easily. |
tMDMViewSearch | Retrieves the MDM records from an MDM hub by applying filtering criteria you have created in a specific view and puts out results in XML structure. |
MDM data processing scenarios
- Deleting master data from an MDM Hub
- Executing a stored procedure using tMDMSP
- Loading records into a business entity
- Reading data from an MDM hub through the REST API
- Reading master data from an MDM hub
- Reading staging data from MDM
- Removing master data partially from the MDM hub
- Retrieving records from an MDM hub via an existing view
- Writing master data in an MDM hub
- Writing staging data into MDM
MDM event processing
MDM event processing components
tMDMReceive | Decodes a context parameter holding MDM XML data and transforms it into a flat schema. |
tMDMRouteRecord | Helps Event Manager to identify the changes you have made on your data so that correlative actions can be triggered. |
tMDMTriggerInput | Reads the XML message (Document type) sent by MDM and passes the information to the component that follows. |
tMDMTriggerOutput | Receives an XML flow (Document type) from the preceding component in the Job. |
MDM event processing scenarios
MemSQL
MemSQL components
tMemSQLClose (deprecated) | Closes the transaction committed in the MemSQL database. |
tMemSQLConnection (deprecated) | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tMemSQLInput (deprecated) | Executes a DB query with a strictly defined order which must correspond to the schema definition. |
tMemSQLOutput (deprecated) | Reads data incoming from the preceding component in the Job and executes the action defined on a given MemSQL table and/or on the data contained in the table. |
tMemSQLRow (deprecated) | Acts on the actual database structure or on the data (although without handling data). |
MemSQL scenario
Microsoft CRM
Microsoft CRM components
tMicrosoftCrmInput | Extracts data from a Microsoft Dynamics CRM or a Microsoft Dynamics 365 CRM database based on conditions set on specific columns. |
tMicrosoftCrmOutput | Writes data into a Microsoft Dynamics CRM database or a Microsoft Dynamics 365 CRM database. |
Microsoft CRM scenario
Microsoft MQ
Microsoft MQ components
tMicrosoftMQInput | Retrieves the first message in a given Microsoft message queue (only support String). |
tMicrosoftMQOutput | Writes a defined column of given inflow data to Microsoft message queue (only support String type). |
Microsoft MQ scenario
MOM
MOM components
tMomCommit | Commits data on the MQ Server. |
tMomConnection | Opens a connection to the MQ Server for communication. |
tMomInput | Fetches a message from a queue on a Message-Oriented Middleware (MOM) system and passes it on to the next component. |
tMomMessageIdList | Fetches a message ID list from a queue on a Message-Oriented middleware system and passes it to the next component. |
tMomOutput | Adds a message to a Message-Oriented Middleware system queue in order for it to be fetched asynchronously. |
tMomRollback | Cancels the transaction committed in the MQ Server. |
MOM scenarios
Mondrian
Mondrian component
tMondrianInput (deprecated) | Executes a multi-dimensional expression (MDX) query corresponding to the dataset structure and schema definition. |
Mondrian scenario
MongoDB
MongoDB components
tMongoDBConfiguration | Stores connection information and credentials to be reused by other MongoDB components. |
tMongoDBLookupInput | Executes a database query with a strictly defined order which must correspond to the schema definition. |
tMongoDBBulkLoad | Imports data files in different formats (CSV, TSV or JSON) into the specified MongoDB database so that the data can be further processed. |
tMongoDBClose | Closes a connection to the MongoDB database. |
tMongoDBConnection | Creates a connection to a MongoDB database and reuse that connection in other components. |
tMongoDBGridFSDelete | Automates the delete action over specific files in MongoDB GridFS. |
tMongoDBGridFSGet | Connects to a MongoDB GridFS system to copy files from it. |
tMongoDBGridFSList | Retrieves a list of files based on a query. |
tMongoDBGridFSProperties | Obtains information about the properties of given files selected based on a query. |
tMongoDBGridFSPut | Connects to a MongoDB GridFS system to load files into it. |
tMongoDBInput | Retrieves records from a collection in the MongoDB database and transfers them to the following component for display or storage. |
tMongoDBOutput | Executes the action defined on the collection in the MongoDB database. |
tMongoDBRow | Executes the commands and functions of the MongoDB database. |
MongoDB scenarios
- Reading and writing data in MongoDB using a Spark Streaming Job
- Writing and reading data from MongoDB using a Spark Batch Job
- Creating a collection and writing data to it
- Importing data into MongoDB database
- Managing files using MongoDB GridFS
- Retrieving data from a collection by advanced queries
- Upserting records in a collection
- Using MongoDB functions to create a collection and write data to it
MQTT
MQTT components
tMQTTInput | Acts as consumer of a MQTT topic to stream messages from this topic. |
tMQTTOutput | Acts as publisher to a MQTT topic to stream messages to this topic in real time. |
MS Delimited
MS Delimited components
tFileInputMSDelimited | Reads the data structures (schemas) of a multi-structured delimited file and sends the fields as defined in the different schemas to the next components using Row connections. |
tFileOutputMSDelimited | Creates a complex multi-structured delimited file, using data structures (schemas) coming from several incoming Row flows. |
MS Delimited scenario
MS Positional
MS Positional components
tFileInputMSPositional | Reads the data structures (schemas) of a multi-structured positional file and sends the fields as defined in the different schemas to the next components using Row connections. |
tFileOutputMSPositional | Creates a complex multi-structured file, using data structures (schemas) coming from several incoming Row flows. |
MS Positional scenario
MS XML connectors
MS XML connectors components
tFileInputMSXML | Reads the data structures (schemas) of a multi-structured XML file and sends the fields as defined in the different schemas to the next components using Row connections. |
tFileOutputMSXML | Creates a complex multi-structured XML file, using data structures (schemas) coming from several incoming Row flows. |
MS XML connectors scenario
MSSql
MSSql components
tMSSqlBulkExec | Offers gains in performance while executing the Insert operations to a Microsoft SQL Server database. |
tMSSqlClose | Closes a transaction in the MSSql databases. |
tMSSqlColumnList | Lists all column names of a given MSSql table. |
tMSSqlCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tMSSqlConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tMSSqlInput | Executes a DB query with a strictly defined order which must correspond to the schema definition. |
tMSSqlLastInsertId | Retrieves the last primary keys added by a user to a MSSql table. |
tMSSqlOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tMSSqlOutputBulk | Prepares the file to be used as parameter in the INSERT query to feed the MSSql database. |
tMSSqlOutputBulkExec | Gains in performance during Insert operations to a Microsoft SQL Server database. |
tMSSqlRollback | Cancels the transaction commit in the MSSql database and thus avoids to commit part of a transaction involuntarily. |
tMSSqlRow | Acts on the actual DB structure or on the data (although without handling data). |
tMSSqlSP | Offers a convenient way to centralize multiple or complex queries in a database and calls them easily. |
tMSSqlTableList | Lists the names of a given set of MSSql tables using a select statement based on a Where clause. |
MSSql scenarios
MySQL
MySQL components
tMysqlConfiguration | Stores connection information and credentials to be reused by other MySQL components. |
tMySQLInvalidRows | Checks MySQL database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tMysqlLookupInput | Reads a MySQL database and extracts fields based on a query. |
tMySQLValidRows | Checks MySQL database rows against Data Quality patterns (regular expression). |
tMysqlBulkExec | Offers gains in performance while executing the Insert operations on a MySQL or Aurora database. |
tMysqlClose | Closes the transaction committed in a Mysql database. |
tMysqlColumnList | Iterates on all columns of a given Mysql table and lists column names. |
tMysqlCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tMysqlConnection | Opens a connection to the specified MySQL database for reuse in the subsequent subJob or subJobs. |
tMysqlInput | Executes a DB query with a strictly defined order which must correspond to the schema definition. |
tMysqlLastInsertId | Obtains the primary key value of the record that was last inserted in a Mysql table by a user. |
tMysqlOutput | Writes, updates, makes changes or suppresses entries in a database. |
tMysqlOutputBulk | Writes a file with columns based on the defined delimiter and the MySQL or Aurora standards. |
tMysqlOutputBulkExec | Executes the Insert action in the specified MySQL or Aurora database. |
tMysqlRollback | Cancels the transaction commit in the connected MySQL database to avoid committing part of a transaction involuntarily. |
tMysqlRow | Executes the stated SQL query on the specified MySQL database. |
tMysqlSP | Calls a MySQL database stored procedure. |
tMysqlTableList | Lists the names of a given set of Mysql tables using a select statement based on a Where clause. |
MySQL scenarios
- Checking customer table against a given DQ rule to select customer records
- Controlling the data definition language via tMysqlOutput when creating a table
- Reading email addresses from a DB table and retrieving specific data
- Updating a database table using tMysqlOutput in a Big Data Streaming Job
- Writing dynamic columns from a source file to a database
- Combining two flows for selective output
- Getting the ID for the last inserted record with tMysqlLastInsertId
- Inserting a column and altering data using tMysqlOutput
- Inserting data in bulk in MySQL database
- Inserting data in mother/daughter tables
- Inserting transformed data in MySQL database
- Iterating on DB tables and deleting their content using a user-defined SQL template
- Iterating on a DB table and listing its column names
- Removing and regenerating a MySQL table index
- Retrieving data in error with a Reject link
- Sharing a database connection between a parent Job and child Job
- Updating data using tMysqlOutput
- Using PreparedStatement objects to query data
- Using tMysqlSP to find a State Label using a stored procedure
- Writing columns from a MySQL database to an output file using tMysqlInput
NamedPipe
NamedPipe components
tNamedPipeClose | Closes a named-pipe at the end of a process. |
tNamedPipeOpen | Opens a named-pipe for writing data into it. |
tNamedPipeOutput | Writes data into an existing open named-pipe. |
NamedPipe scenario
Natural Language Processing
Natural Language Processing components
tCompareColumns | Compares two columns to design useful features for generating a classification model. |
tNLPModel | Uses an input in CoNLL format and automatically generates token-level features to create a model for classification tasks like Named Entity Recognition (NER). |
tNLPPredict | Uses a classifier model generated by tNLPModel to predict and label the input text. |
tNLPPreprocessing | Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks. |
Natural Language Processing scenarios
Neo4j
Neo4j components
tNeo4jv4Close | Closes a connection to a Neo4j version 4.x database. |
tNeo4jv4Connection | Establishes a connection to a Neo4j version 4.x database for later use. |
tNeo4jv4Input | Reads data from Neo4j version 4.x and sends data in the output flow. |
tNeo4jv4Output | Receives data from the preceding component and writes the data into a Neo4j version 4.x database. |
tNeo4jv4Row | Executes the stated Cypher query onto the specified Neo4J version 4.x database. |
tNeo4jBatchOutput | Receives data from the preceding component and writes the data into a local Neo4j database. |
tNeo4jBatchOutputRelationship | Receives data from the preceding component and writes relationships in bulk into a local Neo4j database. |
tNeo4jBatchSchema | Defines the schema of a local Neo4j database. |
tNeo4jClose | Close an active connection to an Neo4j database in embedded mode. |
tNeo4jConnection | Opens a connection to a Neo4j database to be reuse by other Neo4j components. |
tNeo4jImportTool | Uses Neo4j Import Tool to create a Neo4j database and import large amounts of data in bulk from CSV files to this database. |
tNeo4jInput | Reads data from Neo4j and sends data in the output flow. |
tNeo4jOutput | Receives data from the preceding component and writes the data into Neo4j. |
tNeo4jOutputRelationship | Receives data from the preceding component and writes relationships into Neo4j. |
tNeo4jRow | Executes the stated Cypher query onto the specified Neo4J database. |
Neo4j scenarios
- Creating nodes with a label using a Cypher query
- Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query
- Importing data from a CSV file to Neo4j using a Cypher query
- Writing information of actors and movies to Neo4j with hierarchical relationship using Neo4j Batch components
- Writing data to a Neo4j database and reading specific data from it
- Writing family information to Neo4j and creating relationships
- Writing information of actors and movies to Neo4j with hierarchical relationship
Netezza
Netezza components
tNetezzaBulkExec | Offers gains in performance while carrying out the Insert operations to a Netezza database. |
tNetezzaClose | Closes the transaction committed in the connected Netazza database. |
tNetezzaCommit | Validates the data processed through the Job into the connected Netezza database. |
tNetezzaConnection | Opens a connection to a Netezza database to be reused in the subsequent subJob or subJobs. |
tNetezzaInput | Reads a database and extracts fields from a Netezza database based on a query. |
tNetezzaNzLoad | Inserts data into a Netezza database table using Netezza's nzload utility. |
tNetezzaOutput | Writes, updates, makes changes or suppresses entries in a Netezza database. |
tNetezzaRollback | Cancels the transaction committed in the connected Netezza database to avoid committing part of a transaction involuntarily. |
tNetezzaRow | Executes the SQL query stated onto the specified Netezza database. |
Netsuite
Netsuite components
tNetSuiteV2019Connection | Creates a connection to a NetSuite SOAP server by leveraging NetSuite v2019 features so that other NetSuite V2019 components in the Job can reuse the connection. |
tNetSuiteV2019Input | Invokes the NetSuite SOAP service and retrieves data according to the conditions you specify by leveraging NetSuite v2019 features. |
tNetSuiteV2019Output | Invokes the NetSuite SOAP service and inserts, updates, or removes data on the NetSuite SOAP server by leveraging NetSuite v2019 features. |
tNetsuiteConnection (deprecated) | Creates a connection to the NetSuite SOAP server so that other NetSuite components in the Job can reuse the connection. |
tNetsuiteInput (deprecated) | Invokes the NetSuite SOAP service and retrieves data according to the conditions you specify. |
tNetsuiteOutput (deprecated) | Invokes the NetSuite SOAP service and inserts, updates, or removes data on the NetSuite SOAP server. |
Netsuite scenarios
Openbravo ERP
Openbravo ERP components
tOpenbravoERPInput (deprecated) | Extracts data from OpenBravoERP database according to the conditions defined in specific columns. |
tOpenbravoERPOutput (deprecated) | Writes data in an OpenbravoERP database. |
Oracle
Oracle components
tOracleConfiguration | Stores connection information and credentials to be reused by other Oracle components. |
tOracleInvalidRows | Checks Oracle database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tOracleLookupInput | Reads a database and extracts fields based on a query. |
tOracleValidRows | Checks Oracle database rows against Data Quality patterns (regular expression). |
tOracleBulkExec | Offers gains in performance during operations performed on data of an Oracle database. |
tOracleClose | Closes the transaction committed in the connected Oracle database. |
tOracleCommit | Validates the data processed through the Job into the connected Oracle database |
tOracleConnection | Opens a connection to the specified Oracle database for reuse in the subsequent subJob or subJobs. |
tOracleInput | Reads an Oracle database and extracts fields based on a query. |
tOracleOutput | Writes, updates, makes changes or suppresses entries in an Oracle database. |
tOracleOutputBulk | Writes a file with columns based on the defined delimiter and the Oracle standards. |
tOracleOutputBulkExec | Executes the Insert action in the specified Oracle database. |
tOracleRollback | Cancels the transaction commit in the connected Oracle database to avoid committing part of a transaction involuntarily. |
tOracleRow | Executes the stated SQL query on the specified Oracle database. |
tOracleSP | Calls an Oracle database stored procedure. |
tOracleTableList | Lists the names of specified Oracle tables using a SELECT statement based on a WHERE clause. |
Oracle scenarios
ORC
ORC components
tFileInputORC | Extracts records from a given ORC format file and sends the data to the next component for further processing. |
tFileOutputORC | Receives records from the processing component placed ahead of it and writes the records into ORC format files. |
Orchestration (Integration)
Orchestration (Integration) components
tCollector | Feeds the parallel execution processes with the threads generated by tPartitioner. |
tDepartitioner | Assembles the outputs of the parallel execution processes so that tRecollector can capture those outputs. |
tParallelize | Manages complex Job systems. It executes several subJobs simultaneously and synchronizes the execution of a subJob with other subJobs within the main Job. |
tPartitioner | Partitions the input data before tCollector can transfer them to the parallel execution processes. |
tRecollector | Outputs of the parallel execution results, depending on tDepartitioner. |
tFlowToIterate | Reads data line by line from the input flow and stores the data entries in iterative global variables. |
tForeach | Creates a loop on a list for an iterate link. |
tInfiniteLoop | Executes a task or a Job automatically, based on a loop. |
tIterateToFlow | Transforms non processable data into a processable flow. |
tLoop | Executes a task or a Job automatically, based on a loop |
tPostjob | Triggers a task required after the execution of a Job |
tPrejob | Triggers a task required for the execution of a Job |
tReplicate | Duplicates the incoming schema into two identical output flows. |
tRunJob | Manages complex Job systems which need to execute one Job after another. |
tSleep | Identifies possible bottlenecks using a time break in the Job for testing or tracking purpose. |
tUnite | Centralizes data from various and heterogeneous sources. |
tWaitForFile | Iterates on a directory and triggers the next component when the defined condition is met. |
tWaitForSocket | Triggers a Job based on a defined condition. |
tWaitForSqlData | Iterates on a given connection for insertion or deletion of rows and triggers a subJob when a condition linked to SQL data presence is met. |
Orchestration (Integration) scenarios
- Parallelizing/synchronizing subJobs execution
- Sorting the customer data of large size in parallel
- Calling a Job and passing the parameter needed to the called Job
- Executing a Job multiple times using a loop
- Handling files before and after the execution of a data Job
- Iterating on a list and retrieving the values
- Iterating on files and merge the content
- Passing a value from a parent Job to a child Job
- Propagating the buffered output data from the child Job to the parent Job
- Replicating a flow and sorting two identical flows respectively
- Running a list of child Jobs dynamically
- Transforming a list of files as data flow
- Transforming data flow to a list
- Waiting for a file to be created and continuing the iteration loop after a message is triggered
- Waiting for a file to be created and stopping the iteration loop after a message is triggered
- Waiting for insertion of rows in a table
Palo
Palo components
tPaloCheckElements (deprecated) | Checks whether elements are present in an incoming data flow existing in a given cube. |
tPaloClose (deprecated) | Closes an active connection to a Palo Server. |
tPaloConnection (deprecated) | Opens a connection to a Palo Server and allows other components involved in a process to share the connection for the duration of the process. |
tPaloCube (deprecated) | Performs operations on a given Palo cube. |
tPaloCubeList (deprecated) | Retrieves a list of cube details from the given Palo database. |
tPaloDatabase (deprecated) | Manages the databases inside a Palo server. |
tPaloDatabaseList (deprecated) | Lists database names, database types, number of cubes, number of dimensions, database status and database id from a given Palo server. |
tPaloDimension (deprecated) | Manages Palo dimensions, even elements inside a database. |
tPaloDimensionList (deprecated) | Retrieves a list of dimension details from the given Palo database. |
tPaloInputMulti (deprecated) | Retrieves the stored or calculated values in combination with the element records out of a cube. |
tPaloOutput (deprecated) | Takes the input stream and writes it to a given Palo cube. |
tPaloOutputMulti (deprecated) | Takes the input stream and writes it to a given Palo cube. |
tPaloRule (deprecated) | Manages rules in a given cube. |
tPaloRuleList (deprecated) | Lists all rules, formulas, comments, activation status, external IDs from a given cube. |
Palo scenarios
- Creating a cube in an existing database
- Creating a database
- Creating a dimension with elements
- Creating a rule in a given cube
- Rejecting inflow data when the elements to be written do not exist in a given cube
- Retrieving detailed cube information from a given database
- Retrieving detailed database information from a given Palo server
- Retrieving detailed dimension information from a given database
- Retrieving detailed rule information from a given cube
- Retrieving dimension elements from a given cube
- Writing data into a given cube
ParAccel
ParAccel components
tParAccelBulkExec (deprecated) | Improves performance when loading data in ParAccel database. |
tParAccelClose (deprecated) | Closes a transaction. |
tParAccelCommit (deprecated) | Commits in one go a global transaction, using a unique connection, instead of doing that on every row or every batch and thus provides gain in performance. |
tParAccelConnection (deprecated) | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tParAccelInput (deprecated) | Reads a database and extracts fields based on a query. |
tParAccelOutput (deprecated) | Executes the action defined on the table and/or on the data of a table, according to the input flow form the previous component. |
tParAccelOutputBulk (deprecated) | Prepares the file to be used as parameter in the INSERT query to feed the ParAccel database. |
tParAccelOutputBulkExec (deprecated) | Improves performance when loading data in ParAccel database. |
tParAccelRollback (deprecated) | Avoids to commit part of a transaction involuntarily. |
tParAccelRow (deprecated) | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL statements. |
Parquet
Parquet components
tFileInputParquet | Extracts records from a given Parquet format file and sends the data to the next component for further processing. |
tFileOutputParquet | Receives records from the processing component placed ahead of it and writes the records into Parquet format files. |
tFileStreamInputParquet | Extracts records from a given Parquet format file for other components to process the records. |
Petals
Petals components
tPetalsInput (deprecated) | Passes Petals' data to a Talend Job. |
tPetalsOutput (deprecated) | Transfers the data in a Talend Job to Petals ESB. |
POP
POP component
tPOP | Fetches one or more email messages from a server using the POP3 or IMAP protocol. |
POP scenario
Positional
Positional components
tFileStreamInputPositional | Listens on a given directory for new files, reads data from them row by row and extracts fields based on a specific pattern. |
tFileInputPositional | Reads a positional file row by row to split them up into fields based on a given pattern and then sends the fields as defined in the schema to the next component. |
tFileOutputPositional | Writes a file row by row according to the length and the format of the fields or columns in a row. |
Positional scenarios
PostgresPlus
PostgresPlus components
tPostgresPlusBulkExec | Improves performance during Insert operations to a DB2 database. |
tPostgresPlusClose | Closes the transaction committed in the connected PostgresPlus database. |
tPostgresPlusCommit | Commits in one go a global transaction, using a unique connection, instead of doing that on every row or every batch and thus improves performance. |
tPostgresPlusConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tPostgresPlusInput | Executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. |
tPostgresPlusOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. |
tPostgresPlusOutputBulk | Prepares the file to be used as parameter in the INSERT query to feed the PostgresPlus database. |
tPostgresPlusOutputBulkExec | Improves performance during Insert operations to a PostgresPlus database. |
tPostgresPlusRollback | Avoids to commit part of a transaction involuntarily. |
tPostgresPlusRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL statements. |
PostgreSQL
PostgreSQL components
tPostgresqlInvalidRows | Extracts DB rows that do not match a given data quality pattern, you can then implement any required correction. |
tPostgresqlValidRows | Extracts DB rows that match a given data quality pattern. |
tPostgresqlBulkExec | Improves performance while carrying out the Insert operations to a Postgresql database. |
tPostgresqlClose | Closes the transaction committed in the connected Postgresql database. |
tPostgresqlCommit | Commits in one go a global transaction, using a unique connection, instead of doing that on every row or every batch and thus improves performance. |
tPostgresqlConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tPostgresqlInput | Executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. |
tPostgresqlOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. |
tPostgresqlOutputBulk | Prepares the file to be used as parameters in the INSERT query to feed the Postgresql database. |
tPostgresqlOutputBulkExec | Improves performance during Insert operations to a Postgresql database. |
tPostgresqlRollback | Avoids to commit part of a transaction involuntarily. |
tPostgresqlRow | Acts on the actual DB structure or on the data (although without handling data), depending on the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL statements. |
Processing (Integration)
Processing (Integration) components
tCacheIn | Offers faster access to the persistent data. |
tCacheOut | Persists the input RDDs depending on the specific storage level you define in order to offer faster access to these datasets later. |
tExtractDynamicFields | Parses a Dynamic column to create standard output columns. |
tExtractEDIField | Reads the EDI structured data from an EDIFACT message file, generates an XML according to the EDIFACT family and the EDIFACT type, extracts data by parsing the generated XML using the XPath queries manually defined or coming from the Repository wizard, and finally sends the data to the next component via a Row connection. |
tExtractRegexFields | Extracts data and generates multiple columns from a formatted string using regex matching. |
tSample | Returns a sample subset of the data being processed. |
tSqlRow | Performs SQL queries over input datasets. |
tTop | Sorts data and outputs several rows from the first one of this data. |
tTopBy | Groups and sorts data and outputs several rows from the first one of the data in each group. |
tWindow | Applies a given Spark window on the incoming RDDs and sends the window-based RDDs to its following component. |
tWriteAvroFields | Transforms the incoming data into Avro files. |
tWriteDelimitedFields | Converts records into byte arrays. |
tWriteDynamicFields | Creates a dynamic schema from input columns in the component. |
tWritePositionalFields | Converts records into byte arrays. |
tWriteXMLFields | Converts records into byte arrays. |
tAggregateRow | Receives a flow and aggregates it based on one or more columns. |
tAggregateSortedRow | Aggregates the sorted input data for output column based on a set of operations. Each output column is configured with many rows as required, the operations to be carried out and the input column from which the data will be taken for better data aggregation. |
tConvertType | Converts one Talend java type to another automatically, and thus avoid compiling errors. |
tDenormalize | Denormalizes the input flow based on one column. |
tDenormalizeSortedRow | Synthesizes sorted input flow to save memory. |
tExternalSortRow | Sorts input data based on one or several columns, by sort type and order, using an external sort application. |
tExtractDelimitedFields | Generates multiple columns from a delimited string column. |
tExtractJSONFields | Extracts the desired data from JSON fields based on the JSONPath or XPath query. |
tExtractPositionalFields | Extracts data and generates multiple columns from a formatted string using positional fields. |
tExtractXMLField | Reads the XML structured data from an XML field and sends the data as defined in the schema to the following component. |
tFilterColumns | Homogenizes schemas either by ordering the columns, removing unwanted columns or adding new columns. |
tFilterRow | Filters input rows by setting one or more conditions on the selected columns. |
tJoin | Performs inner or outer joins between the main data flow and the lookup flow. |
tNormalize | Normalizes the input flow following SQL standard to help improve data quality and thus eases the data update. |
tPartition | Allows you to visually define how an input dataset is partitioned. |
tReplace | Cleanses all files before further processing. |
tReplicate | Duplicates the incoming schema into two identical output flows. |
tSampleRow | Selects rows according to a list of single lines and/or a list of groups of lines. |
tSortRow | Helps creating metrics and classification table. |
tSplitRow | Splits one input row into several output rows. |
tUniqRow | Ensures data quality of input or output flow in a Job. |
tUnite | Centralizes data from various and heterogeneous sources. |
tWriteJSONField | Transforms the incoming data into JSON fields and transfers them to a file, a database table, etc. |
Processing (Integration) scenarios
- Aggregating values based on dynamic schema
- Converting java types using Map/Reduce components
- Creating a dynamic column and extract its content
- Deduplicating entries based on dynamic schema
- Deduplicating entries using Map/Reduce components
- Extracting data from an EDIFACT message
- Extracting name, domain and TLD from e-mail addresses
- Extracting the contents of a dynamic column via tJavaRow
- Matching input data against a reference file based on a dynamic column
- Normalizing data using Map/Reduce components
- Performing download analysis using a Spark Batch Job
- Replacing values and filtering columns using Map/Reduce components
- Sorting entries based on dynamic schema
- Aggregating values and sorting data
- Cleaning up and filtering a CSV file
- Collecting data from your favorite online social network
- Converting java types
- Deduplicating entries
- Denormalizing on multiple columns
- Denormalizing on one column
- Doing an exact match on two columns and outputting the main and rejected data
- Extracting XML data from a field in a database table
- Extracting a delimited string column of a database table
- Extracting correct and erroneous data from an XML field in a delimited file
- Filtering a list of names through different logical operations
- Filtering a list of names using simple conditions
- Filtering rows and groups of rows
- Iterating on files and merge the content
- Normalizing data
- Regrouping sorted rows
- Replicating a flow and sorting two identical flows respectively
- Retrieving error messages while extracting data from JSON fields
- Sorting and aggregating the input data
- Sorting entries
- Splitting one row into two rows
- Writing flat data into JSON fields
Properties
Properties components
tFileInputProperties | Reads a text file row by row and separates the fields according to the model key = value. |
tFileOutputProperties | Writes a configuration file, of the type .ini or .properties, containing text data organized according to the model key = value. |
Properties scenario
Proxy
Proxy component
tSetProxy | Sets the relevant information for proxy setup. |
RabbitMQ
RabbitMQ components
tRabbitMQClose | Closes a connection to a message queue. |
tRabbitMQConnection | Establishes a connection to a message queue for later use. |
tRabbitMQInput | Reads messages from a message queue and passes the messages in the output flow. |
tRabbitMQOutput | Receives data from the preceding component as messages and adds the messages to queues in the specified way. |
Raw
Raw components
tFileInputRaw | Reads all data in a raw file and sends it to a single output column for subsequent processing by another component. |
tFileOutputRaw | Provides data coming from another component, in the form of a single column of output data. |
Regex
Regex components
tFileStreamInputRegex | Listens on a given directory for new files, then reads data from these files, row by row, in order to split the data into fields using regular expressions. |
tFileInputRegex | Reads a file row by row to split them up into fields using regular expressions and sends the fields as defined in the schema to the next component. |
Regex scenario
REST
REST component
tREST | Serves as a REST Web service client. |
REST scenario
Riak
Riak components
tRiakBucketList (deprecated) | Retrieves a list of buckets from a Riak cluster and iterates on it. |
tRiakClose (deprecated) | Closes an active connection to a Riak cluster so as to release occupied resources. |
tRiakConnection (deprecated) | Opens and reuses of the connection it creates to a Riak cluster. |
tRiakInput (deprecated) | Extracts the desired data from a bucket in a Riak node so as to store or apply changes to the data. |
tRiakKeyList (deprecated) | Retrieves a list of keys and iterates on it within a Riak bucket for analysis or development purposes. |
tRiakOutput (deprecated) | Receives data from the preceding component and writes data into or deletes data from a bucket in a Riak cluster. |
Riak scenario
Route
Route components
tRouteFault | Sends messages from a Data Integration Job to a Mediation Route and mark the message as fault. |
tRouteInput | Accepts messages in a Data Integration Job from a Mediation Route. |
tRouteOutput | Sends messages from a Data Integration Job to a Mediation Route. |
Route scenarios
RSS
RSS components
tRSSInput | Reads RSS-Feeds using URLs. |
tRSSOutput | Creates and writes XML files that hold RSS or Atom feeds. |
RSS scenarios
Salesforce
Salesforce components
tSalesforceBulkExec | Bulk-loads data in a given file into a Salesforce object. |
tSalesforceConnection | Opens a connection to Salesforce. |
tSalesforceEinsteinBulkExec | Loads data into Salesforce Analytics Cloud from a local file. |
tSalesforceEinsteinOutputBulkExec | Gains in performance during data operations to the Salesforce Analytics Cloud. |
tSalesforceGetDeleted | Collects data deleted during a specific period of time from a Salesforce object. |
tSalesforceGetServerTimestamp | Retrieves the current date of the Salesforce server presented in a timestamp format. |
tSalesforceGetUpdated | Collects data updated during a specific period of time from a Salesforce object. |
tSalesforceInput | Retrieves data from a Salesforce object based on a query. |
tSalesforceOutput | Inserts, updates, upserts, or deletes data in a Salesforce object. |
tSalesforceOutputBulk | Generates the file to be processed by the tSalesforceBulkExec component for bulk processing. |
tSalesforceOutputBulkExec | Bulk-loads data in a given file into a Salesforce object. |
Salesforce scenarios
SAP
SAP components
tELTSAPInput | Provides the SAP table schema that will be used by the tELTSAPMap component to generate the SQL SELECT statement. |
tELTSAPMap | Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTSAPInput components. |
tSAPADSOInput | Retrieves data of an active ADSO (Advanced Data Store Object) from an SAP BW system on an SAP HANA database. |
tSAPBapi | Extracts data from or loads data to an SAP server using multiple input/output parameters or the document type parameter. |
tSAPBWInput | Executes an SQL query with a strictly defined order which must correspond to your schema definition. |
tSAPCommit | Commits a global transaction in one go, using a unique connection, instead of doing that on every row or every batch and thus provides gain in performance. |
tSAPConnection | Commits a whole Job data in one go to the SAP system as one transaction. |
tSAPDataSourceOutput | Writes Data Source objects into an SAP BW Data Source system. |
tSAPDataSourceReceiver | Retrieves data requests stored on Talend SAP RFC server and related to a specific Data Source system. |
tSAPDSOInput | Retrieves DSO data from an SAP BW system. |
tSAPDSOOutput | Creates or updates DSO data in an SAP BW table. |
tSAPHanaBulkExec | Improves performance while carrying out the Insert operations to an SAP HANA database. |
tSAPHanaInvalidRows | Checks SAP Hana database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tSAPHanaUnload | Offloads massive data from the SAP HANA database to a third party system. |
tSAPHanaValidRows | Checks SAP Hana database rows against specific Data Quality patterns (regular expression) or Data Quality rules (business rule). |
tSAPIDocInput (deprecated) | Extracts IDoc data set that is used for asynchronous transactions between SAP systems or between a SAP system and another application. |
tSAPIDocOutput | Uploads IDoc data set in XML fomat to an SAP system. |
tSAPIDocReceiver | Extracts data from SAP IDocs stored on an SAP server. |
tSAPInfoCubeInput | Retrieves InfoCube data from an SAP BW system. |
tSAPInfoObjectInput | Retrieves InfoObject data from an SAP BW system. |
tSAPInfoObjectOutput | Writes InfoObject data into an SAP BW system. |
tSAPODPInput | Extracts business data from the ERP part of SAP (SAP Business application, SAP on HANA, SAP R/3, and S4/HANA) through ODP (Operational Data Provisioning). |
tSAPRollback | Cancels the transaction commit in the connected SAP. |
tSAPTableInput | Reads data from an SAP table on an SAP server. |
tSAPHanaClose | Closes a connection to a SAP HANA database. |
tSAPHanaCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tSAPHanaConnection | Establishes a SAP HANA connection to be reused by other SAP HANA components in your Job. |
tSAPHanaInput | Executes a database query with a defined command which must correspond to the schema definition. |
tSAPHanaOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tSAPHanaRollback | Avoids to commit part of a transaction involuntarily. |
tSAPHanaRow | Acts on the actual database structure or on the data (although without handling data). |
SAP scenarios
- Connecting to a given SAP R/3 system for listening the creation of IDoc files (deprecated)
- Consuming Data Source objects using SSL Transport
- Consuming IDocs for processing by tHMap
- Exporting data using tSAPHanaUnload
- Extracting Data using tSAPInfoCubeInput
- Reading data from SAP BW database
- Retrieving ADSO data from SAP BW
- Retrieving data from SAP through ODP
- Retrieving data from an SAP system by calling a BAPI function using document type parameters
- Retrieving data from an SAP system by calling a BAPI function using multiple input/output parameters
- Aggregating and filtering data in multiple SAP tables
SCD
SCD components
tDB2SCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table |
tGreenplumSCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table. |
tInformixSCD | Tracks and shows changes which have been made to Informix SCD dedicated tables |
tIngresSCD (deprecated) | Reflects and tracks changes in a dedicated Ingres SCD table. |
tMSSqlSCD | Tracks and reflects changes in a dedicated SCD table in a Microsoft SQL Server or Azure SQL database. |
tMysqlSCD | Reflects and tracks changes in a dedicated MySQL SCD table. |
tNetezzaSCD | Reflects and tracks changes in a dedicated Netezza SCD table. |
tOracleSCD | Reflects and tracks changes in a dedicated Oracle SCD table. |
tParAccelSCD (deprecated) | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table |
tPostgresPlusSCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table. |
tPostgresqlSCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table. |
tSybaseSCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table. |
tTeradataSCD | Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table. |
tVerticaSCD | Tracks and reflects data changes in a dedicated Vertica SCD table. |
SCD scenario
SCDELT
SCDELT components
tDB2SCDELT | Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated DB2 SCD table. |
tJDBCSCDELT | Tracks data changes in a source database table using SCD (Slowly Changing Dimensions) Type 1 method and/or Type 2 method and writes both the current and historical data into a specified SCD dimension table. |
tMysqlSCDELT | Reflects and tracks changes in a dedicated MySQL SCD table through SQL queries. |
tOracleSCDELT | Reflects and tracks changes in a dedicated Oracle SCD table through SQL queries. |
tPostgresPlusSCDELT | Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated PostgresPlus SCD table. |
tPostgresqlSCDELT | Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated DB2 SCD table. |
tSybaseSCDELT | Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated Sybase SCD table. |
tTeradataSCDELT | Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated Teradata SCD table. |
SCDELT scenarios
SCP
SCP components
tSCPClose | Closes a connection to an SCP protocol. |
tSCPConnection | Opens an SCP connection to transfer files in one transaction. |
tSCPDelete | Removes a file from the defined SCP server. |
tSCPFileExists | Verifies the existence of a file on the defined SCP server. |
tSCPFileList | Lists files from the defined SCP server. |
tSCPGet | Copies files from the defined SCP server. |
tSCPPut | Copies files to the defined SCP server. |
tSCPRename | Renames file(s) on the defined SCP server. |
tSCPTruncate | Removes data from file(s) on the defined SCP server via an SCP connection. |
SCP scenario
ServiceNow
ServiceNow components
tServiceNowConnection | Opens a connection to a ServiceNow instance that can then be reused by other ServiceNow components. |
tServiceNowInput | Accesses ServiceNow and retrieves data from it. |
tServiceNowOutput | Performs the defined action on the data on ServiceNow. |
SingleStore
SingleStore components
tSingleStoreBulkExec | Loads data from a file into a table of a database connected through JDBC API. |
tSingleStoreClose | Closes an active SingleStore connection to release the occupied resources. |
tSingleStoreCommit | Commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tSingleStoreConnection | Opens a connection to the specified database that can then be reused in the subsequent subJob or subJobs. |
tSingleStoreInput | Reads any database using a JDBC API connection and extracts fields based on a query. |
tSingleStoreOutput | Executes the action defined on the data contained in the table, based on the flow incoming from the preceding component in the Job. |
tSingleStoreOutputBulk | Prepares the bulk file to be used as a parameter to feed the database connected. |
tSingleStoreOutputBulkExec | Provides performance gain when loading data from a file into a table of a database connected through JDBC API. |
tSingleStoreRollback | Avoids commiting part of a transaction accidentally by canceling the transaction committed in the connected database. |
tSingleStoreRow | Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder tool to write easily your SQL statements. |
tSingleStoreSP | Centralizes multiple or complex queries in a database in order to call them easily. |
Snowflake
Snowflake components
tSnowflakeConfiguration | Stores connection information and credentials to be reused by other Snowflake components in the Apache Spark Batch framework. |
tSnowflakeBulkExec | Loads data from files in a folder into a Snowflake table. The folder can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3) bucket, or an Azure container. |
tSnowflakeClose | Closes an active Snowflake connection to release the occupied resources. |
tSnowflakeCommit | Provides gain in performance. |
tSnowflakeConnection | Opens a connection to Snowflake that can then be reused by other Snowflake components. |
tSnowflakeInput | Reads data from a Snowflake table into the data flow of your Job based on an SQL query. |
tSnowflakeOutput | Uses the data incoming from its preceding component to insert, update, upsert or delete data in a Snowflake table. |
tSnowflakeOutputBulk | Writes incoming data to files generated in a folder. The folder can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3) bucket, or an Azure container. |
tSnowflakeOutputBulkExec | Writes incoming data to files generated in a folder and then loads the data into a Snowflake database table. The folder can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3) bucket, or an Azure container. |
tSnowflakeRollback | Cancels the transaction commit in the Snowflake database to avoid committing part of a transaction involuntarily. |
tSnowflakeRow | Executes the SQL command stated onto a specified Snowflake database. |
Snowflake scenarios
- Aggregating Snowflake data using context variables as table and connection names
- Loading Data Using COPY Command
- Loading data in a Snowflake table using custom stage path
- Querying data in a cloud file through a materialized view and a Snowflake external table
- Writing data into and reading data from a Snowflake table
SOAP
SOAP component
tSOAP | Calls a method via a Web service in order to retrieve the values of the parameters defined in the component editor. |
SOAP scenarios
Socket
Socket components
tSocketInput | Opens the socket port and listens for the incoming data. |
tSocketOutput | Sends out the data from the incoming flow to a listening socket port. |
Socket scenario
Splunk
Splunk component
tSplunkEventCollector | Sends the event data to Splunk through Splunk HTTP Event Collector. |
SQLite
SQLite components
tSQLiteClose | Closes a transaction committed in the connected DB. |
tSQLiteCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tSQLiteConnection | Opens a connection to the database for a current transaction. |
tSQLiteInput | Executes a DB query with a defined command which must correspond to the schema definition. It passes on rows to the next component via a Main row link. |
tSQLiteOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. |
tSQLiteRollback | Cancels the transaction committed in the SQLite database. |
tSQLiteRow | Executes the defined query onto the specified database and uses the parameters bound with the column. |
SQLite scenarios
SQLTemplate
SQLTemplate components
tSQLTemplate | Executes the common database actions or customized SQL statement templates, for example to drop/create a table. |
tSQLTemplateAggregate | Provides a set of matrix based on values or calculations. |
tSQLTemplateCommit | Commits a global action in one go using a single connection, instead of doing so for every row or every batch of rows separately. This provides a gain in performance. |
tSQLTemplateFilterColumns | Homogenizes schemas by reorganizing, deleting or adding new columns. |
tSQLTemplateFilterRows | Sets row filters for any given data source, based on a WHERE clause. |
tSQLTemplateMerge | Merges data into a database table directly on the DBMS by creating and executing a MERGE statement. |
tSQLTemplateRollback | Cancels the transaction committed in the SQLTemplate database. |
SQLTemplate scenarios
Sqoop
Sqoop components
tSqoopExport | Defines the arguments required by Sqoop for transferring data to a RDBMS. |
tSqoopImport | Defines the arguments required by Sqoop for writing the data of your interest into HDFS. |
tSqoopImportAllTables | Defines the arguments required by Sqoop for writing all of the tables of a database into HDFS. |
tSqoopMerge | Performs an incremental import that updates an older dataset with newer records. The file types of the newer and the older datasets must be the same. |
Sqoop scenarios
SVNLog
SVNLog component
tSVNLogInput | Retrieves the information of a specified revision or range of revisions from an SVN repository. |
SVNLog scenario
Sybase
Sybase components
tSybaseBulkExec | Gains in performance during Insert operations to a Sybase database. |
tSybaseClose | Closes a transaction committed in the connected database. |
tSybaseCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on every row or every batch and thus provides gain in performance. |
tSybaseConnection | Opens a connection to the database for a current transaction. |
tSybaseInput | Executes a DB query with a strictly defined order which must correspond to the schema definition. |
tSybaseIQBulkExec | Loads data into a Sybase database table from a flat file or other database table. |
tSybaseIQOutputBulkExec | Gains in performance during Insert operations to a Sybase IQ database. |
tSybaseOutput | Executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. |
tSybaseOutputBulk | Prepares the file to be used as parameter in the INSERT query to feed the Sybase database. |
tSybaseOutputBulkExec | Gains in performance during Insert operations to a Sybase database. |
tSybaseRollback | Cancels the transaction committed in the Sybase database. |
tSybaseRow | Acts on the actual DB structure or on the data (although without handling data). |
tSybaseSP | Calls a Sybase database stored procedure. |
Sybase scenario
System
System components
tRunJob | Manages complex Job systems which need to execute one Job after another. |
tSetEnv | Adds variables temporarily to system environment during the execution of a Job. |
tSSH | Establishes a communication with distant server and returns securely sensible information. |
tSystem | Calls other system processing commands, already up and running in a larger Job. |
System scenarios
- Calling a Job and passing the parameter needed to the called Job
- Displaying remote system information via SSH
- Echoing 'Hello World!'
- Executing an external command with multiple parameters using tSystem
- Modifying a variable during a Job execution
- Passing a value from a parent Job to a child Job
- Propagating the buffered output data from the child Job to the parent Job
- Running a list of child Jobs dynamically
Tachyon
Tachyon component
tTachyonConfiguration | Defines a connection to Tachyon storage system and enables the reuse of the configuration in the same Job. |
tAddLocationFromIP
tAddLocationFromIP component
tAddLocationFromIP | Replaces IP addresses with geographical locations. |
tAddLocationFromIP scenario
Talend Cloud
Talend Cloud components
tJobFailure | Throws an exception and prompts a message when an error occurs. |
tJobLog | Collects and shows exception data during the execution of the Job in Talend Studio or the task in Talend Cloud Management Console. |
tJobReject | Receives data rejected after task processing. |
tChangeFileEncoding
tChangeFileEncoding component
tChangeFileEncoding | Transforms the character encoding of a given file and generates a new file with the transformed character encoding. |
tChangeFileEncoding scenario
tCreateTemporaryFile
tCreateTemporaryFile component
tCreateTemporaryFile | Creates a temporary file in a specified directory. This component allows you to either keep the temporary file or delete it after the Job execution. |
tCreateTemporaryFile scenario
Technical
Technical components
tBoundedStreamInput | Provides a data stream for the component to be tested and is suitable for use in a test case only. |
tCollectAndCheck | Shows and validates the result of a component test. |
tHashInput | Reads from the cache memory data loaded by tHashOutput to offer high-speed data feed, facilitating transactions involving a large amount of data. |
tHashOutput | Loads data to the cache memory to offer high-speed access, facilitating transactions involving a large amount of data. |
Technical scenarios
Teradata
Teradata components
tTeradataConfiguration | Defines a connection to Teradata and enables the reuse of the connection configuration in the same Job. |
tTeradataLookupInput | Executes a database query with a strictly defined order which must correspond to the schema definition. |
tTeradataClose | Closes the transaction committed in the connected DB. |
tTeradataCommit | Commits in one go, using a unique connection, a global transaction instead of doing that on e |