Processing strings to get the revenue corresponding to small taxi rides - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

Before you begin

  • You have previously created a connection to the system storing your source data.

  • You have previously added the dataset holding your source data.

    Here, hierarchical taxi data including pickup time, dropoff time, fare, etc. (download the type_converter-taxi.json file from the Downloads tab in the left panel of this page).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click ADD PIPELINE on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Convert Small Taxi Rides
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here it is taxi-related data.

    Example

    Warning: The Type Converter processor cannot process sub-records. If you want to convert these records, you need to use a Field Selector processor before in order to reorganize the records and put them on top of the schema.
  4. Select your dataset and click SELECT in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Field Selector processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    reorganize records
  7. In the SELECTORS area:
    1. Enter .pickup.pickup_datetime in the Input list and pickup_time in the Output list, as you want to select and rename the pickup_datetime field of the first location and move it to the top level of the schema.
    2. Add a NEW ELEMENT and enter .dropoff.dropoff_datetime in the Input list and dropoff_time in the Output list, as you want to select and rename the dropoff_datetime field of the first location and move it to the top level of the schema.
    3. Add a NEW ELEMENT and enter .payment.fare_amount in the Input list and fare in the Output list, as you want to select and rename the fare_amount field of the first location and move it to the top level of the schema.

      Example

  8. Click SAVE to save your configuration.
  9. Click and add a Type Converter processor to the pipeline. The configuration panel opens.
  10. Give a meaningful name to the processor.

    Example

    convert rides and fares
  11. In the CONVERTERS area:
    1. Select .pickup_time in the Field path list, DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding pickup time information to an Integer type field.
    2. Add a NEW ELEMENT, and select .dropoff_time in the Field path list, DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding dropoff time information to an Integer type field.
    3. Add a NEW ELEMENT, and select .fare in the Field path list and Double in the Output type list, as you want to convert the String type field holding fare information to a Double type field.
      Tip: You have the possibility to apply multiple conversions on the same field. For example, you can convert a String type field that contains a date into a Long type field, and then use this generated Long type field to convert it into a DateTime type field.

      Example

  12. Click SAVE to save your configuration.
  13. Click after the Type Converter processor on the pipeline and add a Filter processor.
  14. Give a meaningful name to the processor.

    Example

    filter on short rides
  15. In the Filter area:
    1. Type in .{.dropoff_time - .pickup_time > 660000} in the Input list, as you want to filter rides which lasted less than 11 minutes.
    2. Select COUNT in the Optionally select a function to apply list, > in the Operator list and type in 0 in the Value list as you want to count these short rides.
  16. Click SAVE to save your configuration.
  17. (Optional) Look at the preview of the Filter processor to see your data after the filtering operation.

    Example

  18. Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data (HDFS) and give it a meaningful name, short rides data for example.
  19. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  20. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the field types are converted and filtered, and the output flow is sent to the target system you have indicated.