Data generator properties

Properties to configure to be able to generate automatically your test data as a dataset.

Data generator is unidirectional and can only be used as a source dataset in your pipeline.

Data generator connection

Select Data generator connection in the list and configure the connection.

Select your engine from the list and set the main and advanced settings.

After configuring the connection, give it a display name (mandatory) and a description (optional).

Dataset configuration
Property		Configuration
Dataset name		Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection		Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
Type		Select the type of dataset you want to create: Batch if you want to generate records once. The pipeline that uses this dataset will be a batch pipeline. Streaming if you want to generate records every N milliseconds in a streaming pipeline. The pipeline that uses this dataset will be a streaming pipeline and you will be able to define the polling interval in milliseconds in the Polling configuration field of the source dataset.

Main settings
Property	Configuration
Rows	Enter the number of records you want to generate.
Fields	Define the fields and the nature of the data to be generated. You can use predefined types in order to help you generate specific data: Name: Enter the name of the field you want to generate. Example: `firstname` Type: Select in the list the data type you want your field to have. Depending on the type selected, additional fields can be displayed to configure your data. Example: `First Name` Blank %: Enter or select the percentage of empty fields you want to generate. Example: `5`, so that five percent of the generated first names are empty fields.
Random within list	If you select this type, you can manually add random elements with custom values and weights to your generated fields. For example, you can generate a field named `hair_color` with three elements: `brown` with `0.4` weight (40% of the generated values), `red` with `0.4` weight (40% of the generated values) and `blond` with 0.2 weight (20% of the generated values).

Advanced settings
Property	Configuration
Use seed	Enable this option if you want to use a specific seed to initialize a random number generator. Seeds allow you to keep the same results. Example: 123456
Enable custom locales	Enable this option to or select in the list a custom language and country code. By default, it is en-us. You can select multiple locales: in that case records will be created using the different selected locales in a random way. It allows you to change the value of some types according to these locales (for example address records will vary according to the locale selected).

Fields to configure in the source dataset of your pipeline.

Property		Configuration
Random rows number		Enable this option if you want to generate a random number of rows according to a minimum (Minimum rows number) and a maximum (Maximum rows number) number that you define.
Polling configuration (only if you created records using the streaming type)		On the Main tab, set the time between every generation of a set of records in the Min poll interval field. On the Advanced tab, set the number of records generated for every set in the Max poll records field. By default, it is 1.

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!