Available in...Open Studio for Big Data
Open Studio for Data Integration
Open Studio for ESB
About this task
The schema generated displays the columns selected from the XML file and allows you to
further define the schema.
Procedure
-
If needed, rename the metadata in the Name field
(metadata, by default), add a
Comment, and make further modifications, for example:
-
Redefine the columns by editing the relevant fields.
-
Add or delete a column using the
and
buttons.
-
Change the order of the columns using the
and
buttons.
Warning: Avoid using any Java
reserved keyword as a schema column name.
Make sure the data type in the Type column is correctly defined.
Below are the commonly used
Talend
data types:
-
Object: a generic
Talend
data type that allows
processing data without regard to its content, for example, a data
file not otherwise supported can be processed with a tFileInputRaw
component by specifying that it has a data type of Object.
-
List: a space-separated list of
primitive type elements in an XML Schema definition, defined using
the xsd:list element.
-
Dynamic: a data type that can be set
for a single column at the end of a schema to allow processing
fields as VARCHAR(100) columns named either as ‘Column<X>’ or,
if the input includes a header, from the column names appearing in
the header. For more information, see Dynamic schema.
-
Document: a data type that allows
processing an entire XML document without regarding to its content.
-
If the XML file which the schema is based on has been changed, click the
Guess button to generate the schema again. Note that if you
have customized the schema, the Guess feature does not retain
these changes.
-
Click Finish. The new file connection, along with it schema,
appears under the File XML node in the
Repository tree view.
Results
Now you can drag and drop the file connection or any schema of it from the
Repository tree view onto the design workspace as a new
tFileInputXML or tExtractXMLField
component or onto an existing component to reuse the metadata. For further information
about how to use the centralized metadata in a Job, see Using centralized metadata in a Job and Setting a repository schema in a Job.
To modify an existing file connection, right-click it from the
Repository tree view, and select Edit file
xml to open the file metadata setup wizard.
To add a new schema to an existing file connection, right-click the connection from the
Repository tree view and select Retrieve
Schema from the contextual menu.
To edit an existing file schema, right-click the schema from the
Repository tree view and select Edit
Schema from the contextual menu.