Processing strings to get the revenue corresponding to small taxi rides - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

Before you begin

  • You have previously created a connection to the system storing your source data.

  • You have previously added the dataset holding your source data.

    Here, hierarchical taxi data including pickup time, dropoff time, fare, etc. (download the type_converter-taxi.json file from the Downloads tab in the left panel of this page).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click ADD PIPELINE on the PIPELINES page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.
    Convert Small Taxi Rides
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of leads.
    Warning: The Type Converter processor cannot process sub-records. If you want to convert these records, you need to use a Field Selector processor before in order to reorganize the records and put them on top of the schema.
  4. Select your dataset and click SELECT DATASET in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Field Selector processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.
    reorganize records
  7. In the SELECTORS area:
    1. Enter pickup_time in the Field name list and .pickup.pickup_datetime in the Path list, as you want to select and rename the pickup_datetime field of the first location and move it to the top level of the schema..
    2. Add a NEW ELEMENT and enter dropoff_time in the Field name list and .dropoff.dropoff_datetime in the Path list, as you want to select and rename the dropoff_datetime field of the first location and move it to the top level of the schema.
    3. Add a NEW ELEMENT and enter fare in the Field name list and .payment.fare_amount in the Path list, as you want to select and rename the fare_amount field of the first location and move it to the top level of the schema.
  8. Click SAVE to save your configuration.
  9. Click and add a Type Converter processor to the pipeline. The configuration panel opens.
  10. Give a meaningful name to the processor.
    convert rides and fares
  11. In the CONVERTERS area:
    1. Select .pickup_time in the Field name list, DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding pickup time information to an Integer type field.
    2. Add a NEW ELEMENT, and select .dropoff_time in the Field name list, DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding dropoff time information to an Integer type field.
    3. Add a NEW ELEMENT, and select .fare in the Field name list and Double in the Output type list, as you want to convert the String type field holding fare information to a Double type field.
      Tip: You have the possibility to apply multiple conversions on the same field. For example, you can convert a String type field that contains a date into a Long type field, and then use this generated Long type field to convert it into a DateTime type field.
  12. Click SAVE to save your configuration.
  13. Click after the Type Converter processor on the pipeline and add a Filter processor.
  14. Give a meaningful name to the processor.
    filter on short rides
  15. In the Filter area:
    1. Type in .{.dropoff_time - .pickup_time > 660000} in the Field path list, as you want to filter rides which lasted less than 11 minutes.
    2. Select COUNT in the Apply a function first list, > in the Operator list and type in 0 in the Value list as you want to count these short rides.
  16. Click SAVE to save your configuration.
  17. (Optional) Click the preview icon after the Filter processor to preview your data after the filtering operation.
  18. Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data (HDFS) and give it a meaningful name, short rides data for example.
  19. On the top toolbar of Talend Cloud Pipeline Designer, select your Run Profile in the list (for more information, see Execution profiles).
  20. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the field types are converted and filtered, and the output flow is sent to the target system you have indicated.