Loading the input data and removing duplicates - 7.2




  1. Double-click tPigLoad to open its Basic settings view.
  2. Click the [...] button next to Edit schema to open the Schema dialog box.
  3. Click the [+] button to add three columns according to the data structure of the input file: Name (string), Country (string) and Age (integer), and then click OK to save the setting and close the dialog box.
  4. Click Local in the Mode area.
  5. Fill in the Input file URI field with the full path to the input file.
  6. Select PigStorage from the Load function list, and leave rest of the settings as they are.
  7. Double-click tPigDistinct to open its Basic settings view, and click Sync columns to make sure that the input schema structure is correctly propagated from the preceding component.
    This component will remove any duplicates from the data flow.