Processing leads on Amazon S3 and loading them into MySQL - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Before you begin

Procedure

  1. Click Connections > Add connection.
  2. In the panel that opens, select the type of connection you want to create.

    Example

    S3
  3. Select your engine in the Engine list.
    Note:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select the type of connection you want to create.
    Here, select S3 connection.
  5. Fill in the connection properties to access your S3 account as described in Amazon S3 properties, check the connection and click Add dataset.
  6. In the Add a new dataset panel, name your dataset lead generation campaign.
  7. Select S3 in the connection list.
  8. Click Autodetect or manually fill in the required properties to access the file located in your S3 bucket (CSV format, space field delimiter, no header) and click View sample to see a preview of your dataset sample.
  9. Click Validate to save your dataset.
  10. Do the same to add the MySQL connection and MySQL table dataset that will be used as destination in your pipeline. Fill in the connection properties as described in MySQL properties.
  11. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  12. Give the pipeline a meaningful name.

    Example

    From S3 to MySQL - Process leads
  13. Click ADD SOURCE and select your source dataset, lead generation campaign in the panel that opens.
  14. Click and add a Field selector processor to the pipeline in order to select specific fields and give them a meaningful name. The configuration panel opens.
  15. Give a meaningful name to the processor.

    Example

    select countries and revenues
  16. In the Simple view of the Configuration tab, click the icon to open the Select fields window:
    1. Select .field2 and click the icon to rename it country, as you want to select the fields corresponding to customer countries.
    2. Select .field7 and click the icon to rename it revenue, as you want to select the fields corresponding to customer revenues.
  17. Click Save to save your configuration.
  18. Click and add a Filter processor to the pipeline in order to filter the records and keep only the customers who have provided their revenue during the marketing campaign. The configuration panel opens.
  19. Give a meaningful name to the processor.

    Example

    remove empty revenues
  20. In the Filters area:
    1. Select .revenue in the Input list, as you want to process customer revenues.
    2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
    3. Select != in the Operator list and type in N/A in the Value field as you want to filter on customers who provided their revenue.
  21. Click and add a Type Converter processor to the pipeline in order to convert the format of the revenue fields (string format). The configuration panel opens.
  22. Give a meaningful name to the processor.

    Example

    convert revenue formats
  23. In the Converters area, select .revenue in the Field path list and Double in the Output type list, as you want to convert the String type field holding revenue information to a Double type field.
  24. Click Save to save your configuration.
  25. Click and add an Aggregate processor to the pipeline. The configuration panel opens.
  26. Give a meaningful name to the processor.

    Example

    count average revenue by country
  27. In the Group by area, select the field you want to use for your aggregation set, here .country.
  28. In the Operations area:
    1. Select .revenue in the Field path list and Average in the Operation list.
    2. Name the generated field (Output field name), average_revenue for example.
  29. Click Save to save your configuration.
  30. (Optional) Look at the Aggregate processor to preview the calculated data after the aggregating operation: the average revenue per country.
  31. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your output data (MySQL).
  32. Give a meaningful name to the Destination; load in MySQL table for example.
  33. Click Save to save your configuration.
  34. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  35. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the lead information that was stored on S3 has been cleaned, the revenues are aggregated per country and the output flow is sent to the MySQL target table you have defined.