Preparing the Hive tables - 6.5

ELT Hive

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > ELT components > ELT Hive components
Data Quality and Preparation > Third-party systems > ELT components > ELT Hive components
Design and Development > Third-party systems > ELT components > ELT Hive components
EnrichPlatform
Talend Studio

Procedure

  1. Create the Hive table you want to write data in. In this scenario, this table is named as agg_result, and you can create it using the following statement in tHiveRow: create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'
    In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store this created table in HDFS. You need to replace it with the directory you want to use in your environment.
    For further information about tHiveRow, see tHiveRow.
  2. Create two input Hive tables containing the columns you want to join and aggregate these columns into the output Hive table, agg_result. The statements to be used are: create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer' and create table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'
  3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"
    The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You need to create your own files to provide data to the input Hive tables. The data schema of each file should be identical with their corresponding table.
    You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For further information about these two components, see tRowGenerator and tFileOutputDelimited.

    For further information about the Hive query language, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual.