If you need to import numerous datasets from the same source, instead of manually creating them one by one in Talend Cloud Data Inventory, you can create a crawler to retrieve a full list of assets in a single operation.
Crawling a connection allows you to retrieve data at a large scale and enrich your inventory more efficiently. After selecting a connection, you will be able to import all of its content, or part of it via a quick search and filter, and select which users will have access to the newly created datasets.
Crawling a connection for multiple datasets comes with the following prerequisites and limitations:
- The Dataset administrator or Dataset manager role has been assigned to you in Talend Cloud Management Console, or at least the Crawling - Add permission.
- You are using the Remote Engine 2022-02 or later.
- You can only crawl data from a JDBC connection, and only one crawler can be created from a connection at the same time.
To start creating a crawler for a connection, you can either:
The crawler configuration window opens.
- Hover over your connection in the connection list, click the
Crawl connection icon, and then the
Add crawler button.
- Click your connection in the connection list, select the
Crawler tab of the drawer panel, and click
- Hover over your connection in the connection list, click the Crawl connection icon, and then the Add crawler button.
- Select your preferred crawling mode:
Select the tables to import from your data source and click
You now need to define which users will be able to access the datasets that will be created, and with which rights.
To add users to the list of people who can access the datasets, you can
Important: You need to select at least one owner for the datasets in order to proceed.For more information on sharing and roles, see Sharing a dataset.
- Hover over a user or group, click the + icon and assign the rights you want to give with the drop-down list in the right column.
- Select a user or group, click Add as and assign
the rights you want to give with the drop-down list.
You can select multiple groups or users at once using Ctrl + Click or Shift + Click.
- Click Next to reach the last configuration step.
- Enter a Name for your crawler, Snowflake crawler in this case, and optionally a Description to describe the use case and scope of the crawler.
An asynchronous process is launched in the background to crawl the selected datasets from the connection. You are now back to the connections list, with the Crawler tab of the right drawer panel opened, where you can monitor the progress of the datasets creation, as well as the sample availability.Note: When the samples have all been fetched, the data quality and Talend Trust Score™ of every crawled dataset have been fully computed and are visible in the dataset list and each dataset overview. If you want to start working on one of the crawled datasets before its sample is available, you can manually retrieve one by clicking Refresh sample in the dataset sample view.
You cannot edit a crawler configuration after it has started running. To crawl the connection again, with a different table selection or sharing parameters for example, delete the crawler and create a new one.
It is possible to use a crawler name as facet in the dataset search to see all the datasets linked to a given crawler.