Skip to main content Skip to complementary content

Data generator properties

Properties to configure to be able to generate automatically your test data as a dataset.

Data generator is unidirectional and can only be used as a source dataset in your pipeline.

Data generator connection

Property

Configuration

Selection Select or enter Data generator.
Configuration
Engine Select your engine in the list.
Description Enter a display name (mandatory) and a description (optional) for the connection.

Data generator dataset

Property Configuration
Dataset name Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
Type Select the type of dataset you want to create:
  • Batch if you want to generate records once. The pipeline that uses this dataset will be a batch pipeline.
  • Streaming if you want to generate records every N milliseconds in a streaming pipeline. The pipeline that uses this dataset will be a streaming pipeline and you will be able to define the polling interval in milliseconds in the Polling configuration field of the source dataset.
Main Rows Enter the number of records you want to generate.
Fields Define the fields and the nature of the data to be generated. You can use predefined types in order to help you generate specific data:
  • Name: Enter the name of the field you want to generate.
    Example:
    firstname 
  • Type: Select in the list the data type you want your field to have. Depending on the type selected, additional fields can be displayed to configure your data.
    Example:
    First Name
  • Blank %: Enter or select the percentage of empty fields you want to generate.

    Example: 5, so that five percent of the generated first names are empty fields.

Random within list If you select this type, you can manually add random elements with custom values and weights to your generated fields.

For example, you can generate a field named hair_color with three elements: brown with 0.4 weight (40% of the generated values), red with 0.4 weight (40% of the generated values) and blond with 0.2 weight (20% of the generated values).

Advanced Use seed Enable this option if you want to use a specific seed to initialize a random number generator.

Seeds allow you to keep the same results.

Example: 123456

Enable custom locales Enable this option to or select in the list a custom language and country code. By default, it is en-us.

You can select multiple locales: in that case records will be created using the different selected locales in a random way. It allows you to change the value of some types according to these locales (for example address records will vary according to the locale selected).

Data generator source dataset configuration

Fields to configure in the source dataset of your pipeline.

Property Configuration
Random rows number Enable this option if you want to generate a random number of rows according to a minimum (Minimum rows number) and a maximum (Maximum rows number) number that you define.
Polling configuration (only if you created records using the streaming type)
  • On the Main tab, set the time between every generation of a set of records in the Min poll interval field.
  • On the Advanced tab, set the number of records generated for every set in the Max poll records field. By default, it is 1.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!