Preparing the Hive tables - 7.3

ELT Hive

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > ELT components > ELT Hive components
Data Quality and Preparation > Third-party systems > ELT components > ELT Hive components
Design and Development > Third-party systems > ELT components > ELT Hive components
Last publication date
2024-02-21

Procedure

  1. Create the Hive table you want to write data in. In this scenario, this table is named as agg_result, and you can create it using the following statement in tHiveRow: create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'
    In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store this created table in HDFS. You need to replace it with the directory you want to use in your environment.
    For further information about tHiveRow, see tHiveRow.
  2. Create two input Hive tables containing the columns you want to join and aggregate these columns into the output Hive table, agg_result. The statements to be used are: create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer' and create table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'
  3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"
    The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You need to create your own files to provide data to the input Hive tables. The data schema of each file should be identical with their corresponding table.
    You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For further information about these two components, see tRowGenerator and tFileOutputDelimited.

    For further information about the Hive query language, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual.