Here are the main concepts of Talend Cloud Data Inventory that you will come across while following this scenario:
- Connection: Connections are environments or systems where datasets are stored, including databases, file systems, distributed systems or platforms, etc. The connection information to these systems only need to be set up once since they are reusable.
- Dataset: Datasets are collections of data. They can be database tables, file names, topics (Kafka), file paths (HDFS), etc. You also have the possibility to create test datasets that you enter manually and store in a test connection, and even import local files as datasets. Several datasets can be connected to the same system (One-to-many Connectivity) and are stored in reusable connections.
- Sample: Your data will be visible in the form of a sample, retrieved from the dataset metadata.
- Semantic type: The semantic type of a column or record corresponds to the type of data that can be found in it, such as names, zip codes, phone numbers, coordinates, etc. The Talend Cloud applications all benefit from semantic awareness, meaning that when you look at your sample data, it will be automatically categorized using the default semantic types, or the ones that you have created yourself.
- Talend Trust Score™: Global quality indicator that aggregates several metrics into a single score, that scales from 0 to 5.
- Tag: You can apply tags to your datasets like you would a post-it, to freely add any text as metadata information to your Talend Cloud Data Inventory objects.
- Custom attributes: Custom attributes can be applied to your datasets. They allow you to add metadata information following a set of predefined rules and can also be used to help you search and sort your datasets.